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Abstract. We study alternating register automata on data words and data trees in re- 
lation to logics. A data word (resp. data tree) is a word (resp. tree) whose every position 
carries a label from a finite alphabet and a data value from an infinite domain. We in- 
vestigate one-way automata with alternating control over data words or trees, with one 
register for storing data and comparing them for equality. This is a continuation of the 
study started by Demri, Lazic and Jurdzihski. 

From the standpoint of register automata models, this work aims at two objectives: 
(1) simplifying the existent decidability proofs for the emptiness problem for alternating 
register automata; and (2) exhibiting decidable extensions for these models. 

From the logical perspective, we show that (a) in the case of data words, satisfiability 
of LTL with one register and quantification over data values is decidable; and (b) the 
satisfiability problem for the so-called forward fragment of XPath on XML documents is 
decidable, even in the presence of DTDs and even of key constraints. The decidability is 
obtained through a reduction to the automata model introduced. This fragment contains 
the child, descendant, next-sibling and following-sibling axes, as well as data equality and 
inequality tests. 



1. Introduction 

In static analysis of databases as in software verification, we frequently find the need of 
reasoning with infinite alphabets. In program verification one may need to decide statically 
whether a program satisfies some given specification; and the necessity of dealing with infi- 
nite alphabets can arise from different angles. For example, in the presence of concurrency, 
suppose that an unbounded number of processes run, each one with its process identifica- 
tion, and we must deal with properties expressing the interplay between these processes. 
Further, procedures may have parameters and data from some infinite domain could be 
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exchanged as arguments. On the other hand, in the databases context, static analysis on 
XML and its query languages recurrently needs to take into account not only the labels of 
the nodes, but also the actual data contained in the attributes. It is hence important to 
study formalisms to reason with words or trees that can carry elements from some infinite 
domain. 

This work is about decidable alternating register automata models on data words and 
data trees in relation to logics manipulating data. This is a continuation of the investi- 
gation carried out by Demri, Lazic and Jurdzihski [HI [23]. A non trivial consequence of 
our contribution on alternating automata is that the satisfiability problem for the forward 
fragment of XPath on xml documents is decidable. 

We consider two kinds of data models: data words and data trees. A data word (data 
tree) is a finite word (unranked finite tree) whose every position carries a pair of elements: 
a symbol from a finite alphabet and and an element (a datum) from an infinite set (the 
data domain). We work on finite structures, and all the results we present are relative to 
finite words and trees. 

Over these two models we consider two formalisms: alternating register automata on 
the one hand, and logics on the other. Each automata model is related to a logic, in the 
sense that the satisfiability of the logic can be reduced to the emptiness of the automata 
model. Both automata models we present (one for data words, the other for data trees) have 
in essence the same behavior. Let us give a more detailed description of these formalisms. 

Automata. The automata model we define is based on the ARA model (for Alternating 
Register Automata) of [8j in the case of data words, or the ATRA model (for Alternating 
Tree Register Automata) of [24] in the case of data trees. ARA are one-way automata 
with alternating control and one register to store data values for later comparison. ATRA 
correspond to a natural extension of ARA over data trees. The ATRA model can move in 
two directions: to the leftmost child, and/or to the next sibling to the right. Both models 
were shown to have a decidable emptiness problem. The proofs of decidability are based on 
non trivial reductions to a class of decidable counter automata with faulty increments. 

In the present work, decidability of these models is shown by interpreting the seman- 
tics of the automaton in the theory of well-quasi-orderings in terms of a well-structured 
transition system (see |19|). The object of this alternative proof is twofold. On the one 
hand, we propose a direct, unified and self-contained proof of the main decidability results 
of [H [23] . Whereas in |8l [23] decidability results are shown by reduction to a class of faulty 
counter automata, here we avoid such translation, and show decidability directly interpret- 
ing the configurations of the automata in the theory of well-structured transition systems. 
We stress, however, that the underlying techniques used here are similar to those of [Si I24j . 
On the other hand, we further generalize these results. Our proof can be easily extended to 
show the decidability of the nonemptiness problem for two powerful extensions. These ex- 
tensions consist in the following abilities: (a) the automaton can nondeterministically guess 
any data value of the domain and store it in the register; and (b) it can make a certain 
kind of universal quantification over the data values seen along the run of the automaton, 
in particular over the data values seen so far. We name these extensions guess and spread 
respectively. These extensions can be both added to the ARA model or to the ATRA model, 
since our proofs for ARA and ATRA emptiness problems share the same core. We call the 
class of alternating register automata with these extensions as ARA(guess, spread) in the 
case of data words, or ATRA(guess, spread) in the case of data trees. We demonstrate that 
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these extensions are also decidable if the data domain is equipped with a hnear order and 
the automata model is extended accordingly. Further, these models are powerful enough to 
decide a large fragment of XPath, and of a temporal logic with registers. 

XPath. This work is originally motivated by the increasing importance of reasoning tasks 
about XML documents. An xml document can be seen as an unranked ordered tree where 
each node carries a label from a finite alphabet and a set of attributes, each with an associated 
datum from some infinite domain. A data tree is a simplification of an xml document that 
happens to be equivalent for the problems treated here. 

XPath is arguably the most widely used xml node selecting language, part of XQuery 
and XSLT; it is an open standard and a W3C Recommendation [7]. Static analysis on xml 
languages is crucial for query optimization tasks, consistency checking of xml specifica- 
tions, type checking transformations, or many applications on security. Among the most 
important problems are those of query equivalence and query containment. By answering 
these questions we can decide at compile time whether the query contains a contradiction, 
and thus whether the computation of the query on the document can be avoided, or if one 
query can be safely replaced by a simpler one. For logics closed under boolean combina- 
tion, these problems reduce to satisfiability checking, and hence we focus on this problem. 
Unfortunately, the satisfiability problem for XPath with data tests is undecidable ^20j even 
when the data domain has no structure (i.e., where the only data relation available is the 
test for equality or inequality) . It is then natural to identify and study decidable expressive 
fragments. 

In this work we prove that the forward fragment of XPath has a decidable satisfiability 
problem by a reduction to the nonemptiness problem of ATRA(guess, spread). Let us describe 
this logic. Core- XPath [21] is the fragment of XPath that captures all the navigational 
behavior of XPath. It has been well studied and its satisfiability problem is known to be 
decidable in ExpTime in the presence of DTDs [27]. We consider an extension of this 
language with the possibility to make equality and inequality tests between attributes of 
xml elements. This logic is named Core-Data-XPath in [5], and its satisfiability problem 
is undecidable [20]. Here we address a large fragment of Core-Data-XPath named 'forward 
XPath', that contains the 'child', 'descendant', 'self-or-descendant', 'next-sibling', 'following- 
sibling', and 'self-or-following-sibling' axes. For economy of space we refer to these axes as 
I, 4-+, 4*, — —)•"'", — ^* respectively. Note that —)•"'" and — )•* are interdefinable in the presence of 
— )•, and similarly with |+ and We then refer to this fragment as XPath(|, 4*, —)•,—)•*, =), 
where '=' is to indicate that the logic can express equality or inequality tests between data 
values. 

Although our automata model does not capture forward-XPath in terms of expressive- 
ness, we show that there is a reduction to the nonemptiness problem of ATRA(guess, spread). 
These automata can recognize any regular language, in particular a DTD, a Relax NG doc- 
ument type, or the core of XML Schema (stripped of functional dependencies). Since we 
show that XPath (1, 4*, — )•, — )•*, =) can express unary key constraints, it then follows that sat- 
isfiability of forward-XPath in the presence of DTDs and key constraints is decidable. This 
settles a natural question left open in [23], where decidability for a restriction of forward- 
XPath was shown. The fragment treated in the cited work is restricted to data tests of 
the form (e = a) (or {e ^ a)), that is, formulae that test whether there exists an element 
accessible via the a-relation with the same (resp. different) data value as the current node 
of evaluation. However, the forward fragment allows unrestricted tests (a = /3) and makes 
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the coding into a register automata model highly non trivial. As a consequence, we also 
answer positively the open question raised in on whether the downward fragment of 
XPath in the presence of DTDs is decidablej^ 

Temporal logics. ARA(guess, spread) over data words also yield new decidability results 
on the satisfiability for some extensions of the temporal logic with one register denoted 
by LTL^(U,X) in [8]. This lo gic contains a 'freeze' operator to store the current datum 
and a 'test' operator to test the current datum against the stored one. Our automata 
model captures an extension of this logic with quantification over data values, where we 
can express "for all data values in the past, ip holds" , or there exists a data value in the 
future where (p holds" . Indeed, none of these two types of properties can be expressed in 
the previous formalisms of |i8j and [24j. These quantifiers may be added to LTL^(U,X) over 
data words without losing decidability. What is more, decidability is preserved if the data 
domain is equipped with a linear order that is accessible by the logic. Also, by a translation 
into ATRA(guess, spread), these operators can be added to a CTL version of this logic over 
data trees, or to the /U-calculus treated in [23j^ However, adding the dual of either of these 
operators results in an undecidable logic. 

Contribution. From the standpoint of register automata models, our contributions can 
be summarized as follows. 

• We exhibit a unified framework to show decidability for the emptiness problem for alter- 
nating register automata. This proof has the advantage of working both for data words 
and data trees practically unchanged. It is also a simplification of the existing decidability 
proofs. 

• We exhibit decidable extensions for these models of automata. These extensions work 
for automata either running over words or trees. For each of these models there are 
consequences on the satisfiability of some expressive logics. 

From the standpoint of logics, we show the following results. 

• In the case of data trees, we show that the satisfiability problem for the 'forward' fragment 
of XPath with data test equalities and inequalities is decidable, even in the presence of 
DTDs (or any regular language) and unary key constraints. Decidability is shown through 
a reduction to the emptiness problem for ATRA(guess, spread) on data trees. 

• We show that the temporal logic LTL~''(U,X) for data words extended with some quan- 
tification over data values is decidable. This result is established thanks to a translation 
from formulae to alternating register automata of ARA(guess, spread). 



The satisfiability problem on downward XPath but in the absence of DTDs is shown to be ExpTime- 
complete in 

^This is the conference version of [24j . 



Alternating register automata on finite data words and trees 



5 



Related work. The main results presented here first appeared in the conference paper [12] . 
Here we include the full proofs and the analysis for alternating register automata on data 
words (something that was out of the scope of |12] ) as well as its relation to temporal logics. 
Also, we show how to proceed in the presence of a linear order, maintaining decidability. 
The results contained here also appear in the thesis \13\ Ch. 3 and 6]. 

By the lower bounds given in [8], the complexity of the problems we treat in the work 
is very high: non-primitive-recursive. This lower bound is also known to hold for the logics 
we treat here LTL^(U,X) and forward-XPath. In fact, even very simple fragments of LTL~'' 
and XPath are known to have non-primitive-recursive lower bounds, including the fragment 
of [231 or even much simpler ones without the one-step '— )•' shown in [17] . This is 

the reason why in this work we limit ourselves to decidability / undecidability results. 

The work in [3] investigates the satisfiability problem for many XPath logics, mostly 
fragments without negation or without data equality tests in the absence of sibling axes. 
Also, in [TT] there is a study of the satisfiability problem for downward XPath fragments 
with and without data equality tests. All these are sub-fragments of forward-XPath, and 
notably, none of these works considers horizontal axes to navigate between siblings. Hence, 
by exploiting the bisimulation invariance property enjoyed by these logics, the complexity 
of the satisfiability problem is kept relatively low (at most ExpTime) in the presence of 
data values. However, as already mentioned, when horizontal axes are present most of 
the problems have a non-primitive-recursive complexity. In [20], several fragments with 
horizontal axes are treated. The only fragment with data tests and negation studied there 
is incomparable with the forward fragment, and it is shown to be undecidable. In |18j it 
is shown that the vertical fragment of XPath is decidable in non-primitive-recursive time. 
This is the fragment with both downward and upward axes, but notably no horizontal axes 
(in fact, adding horizontal axes to the vertical fragment results in undecidability). 

The satisfiability of first-order logic with two variables and data equality tests is ex- 
plored in |5]. It is shown that FO^ with local one-step relations to move around the data tree 
and a data equality test relation is decidable. [5j also shows the decidability of a fragment 
of XPath(t, I, — =) with sibling and upward axes. However, the logic is restricted to 
one-step axes and to data formulae of the kind {e = a) (or 7^), while the fragment we treat 
here cannot move upwards but has transitive axes and unrestricted data tests. 

2. Preliminaries 

Notation. We first fix some basic notation. Let p(C) denote the set of subsets of C, and 
P<oo(C) be the set of finite subsets of C. Let M = {1, 2, . . . } be the set of positive integers, 
and let [n] := {i \ 1 < i < n} for any n G W. We call Mq = IN" U {0}. We fix once and 
for all D to be any infinite domain of data values; for simplicity in our examples we will 
consider D = Wq. In general we use letters A, IB for finite alphabets, the letter D for an 
infinite alphabet and the letters E and F for any kind of alphabet. By E* we denote the set 
of finite sequences over E and by E'^ the set of infinite sequences over E. We use '-'as the 
concatenation operator between sequences. We write |5| to denote the length of S (if S is 
a sequence), or the cardinality of S (if 5" is a set). 
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2.1. Finite words. We consider a finite word over E as a function w : [n] — )• E for some 
n G M, and we define the set of words as Words{E,) := {[n] — )• E | n G M}. We write 
pos(w) = {1, . . . ,n} to denote the set of positions (that is, the domain of w), and we 
define the size of w as |w| = n. Given w € Words{K) and w' G Words{¥) with pos(w) = 
pos(w') = P, we write w(8)w' G Words{E, x F) for the word such that pos(w (8) w') = P and 
(w(g) w')(x) = (w(x), w'(x)). A data word is a word of Words{A x D), where A is a finite 
alphabet of letters and D is an infinite domain. Note that we define data words as having 
at least one position. This is done for simplicity of the definition of the formalisms we work 
with, and the results contained here also extend to possibly-empty data words and trees. 
We define the word type as a function type^ : pos(w) — )• {[>,[>} that specifies whether a 
position has a next element or not. That is, type^(i) = > iff (i + 1) £ pos(w). 

2.2. Unranked ordered finite trees. We define rrees(E), the set of finite, ordered and 
unranked trees over an alphabet E. A position in the context of a tree is an element of 
IN*. The root's position is the empty string and we note it 'e'. The position of any other 
node in the tree is the concatenation of the position of its parent and the node's index in 
the ordered list of siblings. Along this work we use x, y, z, w, v as variables for positions, 
while i,j,k,l,m,n as variables for numbers. Thus, for example x-i is a position which is 
not the root, and that has x as parent position, and there are i — 1 siblings to the left of 
x-i. 

Formally, we define POS C p<(^(]N*) the set of sets of finite tree positions, such that: 
X £ POS iff (a) X C MM^I < oo; (b) it is prefix-closed; and (c) if n-(i + 1) E X for i E IN, 
then n-i E X. A tree is a mapping from a set of positions to letters of the alphabet 

Trees(E) := {t : P ^ E | P E POS} . 

Given a tree t E Trees(E), pos(t) denotes the domain of t, which consists of the set of 
positions of the tree, and alph(t) = E denotes the alphabet of the tree. From now on, we 
informally write 'node' to denote a position x together with the value t(x). We define the 
ancestor partial order ^ as the prefix relation x ^ x-y for every x-y, and the strict version 
-< as the strict prefix relation x -< x-y for |y| > 0. Given a tree t and x E pos(t), 'tj^.' 
denotes the subtree of t at position x. That is, t\x : {y \ x-y E pos(t)} — )• alph(t) where 
tU(y) = t(x-7/). In the context of a tree t, a siblinghood is a maximal sequence of siblings. 
That is, a sequence of positions x-1, . . . ,x-l £ pos(t) such that x-{l + 1) pos(t). 

Given two trees ti E Trees{¥,), t2 E Trees(¥) such that pos(ti) = pos(t2) = P, we 
define ti t2 : P ^ (ExF) as (ti (g)t2)(x) = (ti(x), t2(x)). 

The set of data trees over a finite alphabet A and an infinite domain D is defined 
as rrees(AxD). Note that every tree t E Trees(AxD) can be decomposed into two trees 
a E Trees(A) and d E rrees(D) such that t = a(8> d. Figure [T] shows an example of a data 
tree. We define the tree type as a function typct : pos(t) — )• {v, V} x {l>, >} that specifies 
whether a node has children and/or siblings to the right. That is, type^.{x) := {a,b) where 
a = V iff x-1 E pos(t), and where 6 = > iff x = x'-i and x'-{i + 1) E pos(t). The notation 
for the set of data values used in a data tree is 

data{si^ d) := {d(x) | x E pos(d)} . 

We abuse notation and write data{X) to refer to all the elements of D contained in X, for 
whatever object X may be. 
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Figure 1: A data tree. 

First-child and next-sibling coding. We will make use of the first-child and next-sibling 
underlying binary tree of an unranked tree t. This is the binary tree whose nodes are the 
same as in t, a node x is the left child of y if x is the leftmost child of y in t, and a node x 
is the right child of y if x is the next sibling to the right of y in t. We will sometimes refer 
to this tree by "the fens coding of t". Let us define the relation between positions ^/cns 
such that X ^ jcns y if x is the ancestor of y in the first-child and next-sibling coding of the 
tree, that is, if y is reachable from x by traversing the tree through the operations 'go to 
the leftmost child' and 'go to the next sibling to the right'. 

2.3. Properties of languages. We use the standard definition of language. Given an 
automaton J? over data words (resp. over data trees), let C[^) denote the set of data words 
(resp. data trees) that have an accepting run on J?. We say that C{J^) is the language of 
words (resp. trees) recognized by j?. We extend this definition to a class of automata : 
C{£^) = {C{!i) I ^ G i/}, obtaining a class of languages. 

Equivalently, given a formula of a logic ^ over data words (resp. data trees) we 
denote by C{ip) the set of words (resp. trees) verified by tp. This is also extended to a logic 
C{^) = {C{^) I G ^}. 

We say that a class of automata (resp. a logic) is at least as expressive as another 
class (resp. logic) ^ iff C{.^) C C{£^). If additionally C{3S) / £(=c/) we say that is 
more expressive than 

We say that a class of automata £^ captures a logic ^ iff there exists a translation 
t : ££ ^ such that for every ip ^ ^ and model (i.e., a data tree or a data word) m, we 
have that m |= (/9 if and only if m G C{t{(p)). 

2.4. Well-structured transition systems. The main argument for the decidability of 
the emptiness of the automata models studied here is based on the theory of well-quasi- 
orderings. We interpret the automaton's run as a transition system with some good proper- 
ties, and this allows us to obtain an effective procedure for the emptiness problem. This is 
known in the literature as a well-structured transition system (WSTS) (see USUI]). Next, 
we reproduce some standard definitions and known results that we will make use of. 

Definition 2.1. For a set S, we define {S, <) to be a well-quasi-order (wqo) iff '<' C SxS 
is a relation that is reflexive, transitive and for every infinite sequence 101,11)2, ■■ ■ G »S"^ there 
are two indices i < j such that Wi < Wj . 

Dickson's Lemma ([TO]). Let <k Q Wq x INq such that {xi, . . . ,Xk) <k {uii ■ ■ ■ ,yk) iff 
Xi < yi for all i G [k]. For all k G Mo, (INq' —k) i^ ^ well-quasi-order. 
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Definition 2.2. Given a quasi-order (S, <) we define tlie embedding order as tlie relation 
Q S* X S* sucfi tfiat xi - ■ ■ Xn Q 2/i • • • Vm iff tfiere exist 1 < ii < • • • < in < ?7i witfi 

Xj < Hi J for all j G [n]. 

Higman's Lemma ([22]). Let {S,<) be a well- quasi- order. Let □ C 5* x S** 6e the 

embedding order over (5, <). Then, {S*,Q) is a well- quasi- order. 



Corollary 2.3 (of Higman's Lemma). Let S be a finite alphabet. Let C be the subword 
relation over S* xS* (i.e., x Q y if x is the result of removing some (possibly none) positions 
from y). Then, (S"*,!^) is a well- quasi- order. 

Proof. It suffices to realize that C is indeed the embedding order over (5*,=), which is 
trivially a wqo since S is finite. D 

Definition 2.4. Given a transition system {S, — t-), and T Q S we define Succ{T) := {a' G 
S \ 3a £ T with a — )■ a'}, and Suae* to its reflexive-transitive closure. Given a wqo (S, <) 
and r C 5, we define the upward closure of T as 'I T := {a \ 3a' £ T,a' < a}, and the 
downward closure as ^T := {a \ 3a' £ T,a < a'}. We say that T is downward-closed 
(resp. upward-closed) iff |r = T (resp. -[T = T). 

Definition 2.5. We say that a transition system (S, — )•) is finitely branching iff Succ{{a}) 
is finite for all a £ S. If Succ{{a}) is also computable for all a, we say that (S, — )•) is 
effective. 

Definition 2.6 (rdc). A transition system (5, — ?■) is refiexive downward compatible 

(or rdc for short) with respect to a wqo (5, <) iff for every oi, a2, a'^ £ S such that a'^ < ai 
and oi — )• there exists a'2 £ S such that a'2 < 02 and either a'^ — )• or a'^ = Cg. 



Our forthcoming decidability results on alternating register automata of Sections 3.2 



and 5.1 will be shown as a consequence of the following propositions. 



Proposition 2.7 ([191 Proposition 5.4]). // {S, <) is a wqo and {S, — )•) a transition system 
such that (1) it is rdc, (2) it is effective, and (3) < is decidable; then for any finite T <Z S 
it is possible to compute a finite set U S such that '[U = '[Succ*{T). 



Lemma 2.8. Given {S, <, — t-) as in Proposition 2.1 , a recursive downward-closed set V ^ S, 
and a finite set T O S. The problem of whether there exists a £ T and b £ V such that 
a — )■* b is decidable. 



Proof. Applying Proposition 2.7, let U Q S finite, with fC/ = '\ Succ* (T) . Since T is finite 
and V is recursive, we can test for every element of b £ U ii b £ V. On the one hand, if 
there is one such b, then by definition b £ Succ* (T), or in other words a — )•* a' < b for 
some a £ T, a' £ Succ*{a). But since V is downward- closed, a' £ V and hence the property 
is true. On the other hand, if there is no such b in U, it means that there is no such b in 
"[U either, as V is downward-closed. This means that there is no such b in Succ*{T) and 
hence that the property is false. □ 

Definition 2.9 (<p). Given an ordering {S, <), we define the majoring ordering over < 
as (p<oo('S'), <p), where S <p S' iff for every a £ S there is b £ S' such that a < b. 

Proposition 2.10. // {S, <) is a wqo, then the majoring order over (S, <) is a wqo. 
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Proof. The fact that this order is reflexive and transitive is immediate from the fact that < 



is a quasi-order. The fact of being a iwe/Z-quasi-order is a simple consequence of Higman's 



Lemma, Each finite set {ai, . . . , a^} can be seen as a sequence of elements ai, . . . , a^, in any 



order. In this context, the embedding order is stricter than the majoring order. In other 



words, if ai, . . . , a„ C a'l, . . . , a'^, then {ai, . . . , a„} <p {a'^, . . . , a'^}. By Higman's Lemma 



the embedding order over (5, <) is a wqo, implying that the majoring order is as well. D 



The following technical Proposition will become useful in Section |5.1| for transferring 
the decidability results obtained for class of automata ARA(guess, spread) on data words to 
the class ATRA(guess, spread) on data trees. 

Proposition 2.11. Let <,— ;>! Q S x S, <p,^2 Q P<oo{S) x p<oo(5') where <p is 
the majoring order over (5, <) and — )-2 is such that if S -^2 S' then: S = {a} U S, 
S' = {61, . . . , bm} U S with a — t-i hi for every i G [m]. Suppose that (S, <) is a wqo which is 
rdc with respect to — Then, (p<oo(5'), <p) is a wqo which is rdc with respect to —7-2. 



Proof. The fact that (p<oo(»S'), <p) is a wqo is given by Proposition 2.10 The rdc property 
with respect to -^2 is simple. Let 

{a'l,..., 4} U 5' <p {a} US ^2 {61,..., 6m} U 5 

with ^ > and S' <p S, a'^ < a for all i £ [i], and a — )-i 6, for all i G [m]. We show by 
induction on £ that there exists S" with {a[, . . . , a'^}US' -^2 ^'^^ <p {61, . . . , 6rn.}u5. 
If ^ = and a has no pre- image, then S' <p 5, and the relation is reflexive compatible since 
this means that S' <p {61, . . . , bm} U S. 

Suppose now that £ > 0. Note that a'^ < a and < is rdc with — s-i. One possibility is 
that for each a — )-i 6j there is some b[ such that — )-i 6^ and b'- < bi. In this case we obtain 

{a;, . . . , a^} U S' ^2 {a'l, . . . , a^.J U {6'i, . . . , 6'^} U S' 

where 

{a'i,...,a^_i} U ({6'i,...,6'^}UcS') <p {a} U ({61, . . . , 6^} U 5) ^2 {61, . . . , 5^} U cS 

and {6'i, . . . , 6'^}U5' <p {61, . . . , 6m}u5. We can then apply the inductive hypothesis on £—1 
and obtain S" such that {a'l, . . . , 4_i}U{6'i, . . . , b'^]yjS' -^l S" and S" <p {61, . . . , 6m}u5. 
Hence, {a'l, . . . , a^} U 5' — )-2 5", obtaining the downward compatibility. 

The only case left to analyze is when, for some < a — >-i bi the compatibility is 
reflexive, that is, a'^ < bi. In this case we take a reflexive compatibility as well, since {a^} U 
•S' <p {bi, . . . , 6m}U5. We then apply the inductive hypothesis on {a'l, . . . , a£_i} U {a^jU^' 
in the same way as before. □ 



Part 1. Data w^ords 

In this first part, we start our study on data words. In Section[3]we introduce our automata 
model ARA(guess, spread), and we show that the emptiness problem for these automata 
is decidable. This extends the results of [8j. To prove decidability, we adopt a different 
approach than the one of [8] that enables us to show the decidability of some extensions, 
and to simplify the decidability proofs of Part [2| which can be seen as a corollary of this 
proof. In Section |4] we introduce a temporal logic with registers and quantification over 
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data values. This logic is shown to have a decidable satisfiability problem by a reduction 
to the emptiness problem of ARA(guess, spread) automata. 

3. ARA MODEL 

An Alternating Register Automaton (ARA) consists in a one-way automaton on data 
words with alternation and one register to store and test data. In |8] it was shown that the 
emptiness problem is decidable and non-primitive-recursive. Here, we consider an extension 
of ARA with two operators: spread and guess. We call this model ARA(spread, guess). 

Definition 3.1. An alternating register automaton of ARA(spread, guess) is a tuple J? = 
(A, Q, qi, 5) such that 

• A is a finite alphabet; 

• Q is a finite set of states; 

• qi € Q is the initial state; and 

• 5 : Q ^ ^ is the transition function, where ^ is defined by the grammar 

a I a I 0? I store((7) | eq | eq | g A | V g' | IX; | guess(g) | spread(g, g') 
where a £ A,q,q' £ Q, Q £ {>, >}. 

This formalism without the guess and spread transitions is equivalent to the automata model 
of [8] on finite data words, where > is to move to the next position to the right on the data 
word, store(g) stores the current datum in the register and eq (resp. eq) tests that the 
current node's value is (resp. is not) equal to the stored. We call a state q £ Q moving if 
6{q) = >q' for some q' £ Q. 

As this automaton is one-way, we define its semantics as a set of 'threads' for each 
node that progress synchronously. That is, all threads at a node move one step forward 
simultaneously and then perform some non-moving transitions independently. This is done 
for the sake of simplicity of the formalism, which simplifies both the presentation and the 
decidability proof. 

Next we define a configuration and then we give a notion of a run over a data word w. 
A configuration is a tuple (i,a,7. A) that describes the partial state of the execution at 
position i. The number i £ pos(w) is the position in the data word w, 7 = w(i) G A x D is 
the current position's letter and datum, and a = type^(i) is the word type of the position 
i. Finally, A £ p<oo{Q x D) is a finite set of active threads, each thread (g, d) consisting 
in a state g and the value d stored in the register. We will always note the set of threads 
of a configuration with the symbol A, and we write A((i) = {q £ Q \ {q^d) £ A} for 
d G D, A(g) = {d G D I (g,d) G A} for q £ Q. By Cara we denote the set of all 
configurations. Given a set of threads A we write data{A) := {d \ {q,d) £ A}, and 
data{{i, a, {a, d), A)) := {d} U data{A). We say that a configuration is moving if for every 
{q,d) G A, g is moving. 

To define a run we first introduce three transition relations over node configurations: the 
non-moving relation — )-£ and the moving relation — We start with — )-£. If the transition 
corresponding to a thread is a store(g), the automaton sets the register with current data 
value and continues the execution of the thread with state g; if it is eq, the thread accepts 
(and in this case disappears from the configuration) if the current datum is equal to that 
of the register, otherwise the computation for that thread cannot continue. The reader can 



Alternating register automata on finite data words and trees 



11 



p^e {i,a,{a,d),{{qj,d')}UA) if 6{q) = qi V q2, j e {1,2} {3.1 

p^e {i,a,ia,d),{{qud'),iq2,d')}UA) if 5(g) = A 92 (3.2; 

p-^e {i,a,{0',d),{{q',d)}UA) if 5(g) = store(g') (3.3^ 

p — {i, a, (a, d), A) if 6{q) = eq and d = d' (3.4 

p — )>£ {i, a, (a, d), A) if 6{q) = eq and d ^ d' (3.5 

yO — (f , a, (a, d), A) if 6{q) = /3? and /? G a (3.6 

p — )>£ (i, a, (a, d), A) if ^(g) = b and 6 = a (3.7 

p — )>£ (i, a, (a, d), A) if 5(g) = b and 6 7^ a (3.8 



Figure 2: Definition of the transition relation — )-£ C Cara x Cara, given a configuration 
p= (i,a,(a,d),{(g,d')}U A). 

check that the rest of the cases defined in Figure [2] follow the intuition of an alternating 
automaton. 

The cases that follow correspond to our extensions to the model of j8j. The guess 
instruction extends the model with the ability of storing any datum from the domain D. 
Whenever 5{q) = guess(g') is executed, a data value (nondeterministically chosen) is saved 
in the register. 

p^e {i,a,ia,d),{{q',e)}UA) if d{q) = gues5{q'),e £ B (3.9) 

Note that the store instruction may be simulated with the guess, eq and A instructions, 
while guess cannot be expressed by the ARA model. 

The 'spread' instruction is an unconventional operator in the sense that it depends 
on the data of all threads in the current configuration with a certain state. Whenever 
6{q) = spread(g2, 9i) is executed, a new thread with state qi and datum d is created for 
each thread {q2, d) present in the configuration. With this operator we can code a universal 
quantification over all the data values that appeared so far (i.e., that appeared in smaller 
positions). We demand that this transition may only be applied if all other possible — )-£ 
kind of transitions were already executed. In other words, only spread transitions or moving 
transitions are present in the configuration. 

p^e {i,a,{a,d),{{qud) \ {q2,d) G A} U A) (3.10) 

iff 5(g) = spread(g2, gi) and for all (g, d) G A either 5(g) is a spread or a moving instruction. 
We also use a weaker one-argument version of spread. 

p^e {i,a,{a,d),{{qud) \ 3q2.{q2,d) G A} U A) (3.11) 

iff 5(g) = spread(gi) and for all {q,d) G A either 5(g) is a spread or a moving instruction. 
Notice that the one-argument version of spread can be simulated with the two-argument 
version, and hence we do not include it in the definition of the automaton. Also, note 
that we enforce the spread operation to be executed once all other non-moving transitions 
have been applied, in order to take into account all the data values that may have been 
introduced in these transitions, as a result of guess operations. This behavior simplifies the 
reduction from the satisfiability of forward-XPath we will show later. This reduction will 
only need to use the weak one-argument version of spread. 
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The — )•,> transition advances all threads of the node simultaneously, and is defined, for 
any type a' G {>,[>} and symbol and with data value 7' G A x D, 

(i,>,7,A) (i + l,a',7',A^) (3.12) 

iff 

(i) for all (g, d) £ A, 5{q) is moving; and 

(ii) \^={{q',d)\{q,d)e/^,5{q) = >q']. 

Finally, we define the transition between configurations as ^ := — U — t-^. 

A run over a data word w = a (g) d is a nonempty sequence Ci ^ ■ ■ ■ ^ Cn with 
Ci = (l,ao,7o,Ao) and Aq = {{qi, d{l))} (i.e., the thread consisting in the initial state 
with the first datum), such that for every j G [n] with Q = (i,Q,7, A): (1) i £ po5{w); 
(2) 7 = w(i); and (3) a = type^(i). We say that the run is accepting iff Cn = {hOi,j,^) 
contains an empty set of threads. If for an automaton we have that C{J^) 7^ we say 
that is nonempty. 



3.1. Properties. We show the following two statements 

• £(ARA(guess, spread)) is not closed under complementation, 

• the ARA(guess, spread) class is more expressive than ARA. 

In fact in the proof below we show the first one, that implies the second one, given the fact 
that the ARA model is closed under complementation. 

Proposition 3.2 (Expressive power). 

(a) the ARA(guess) class is more expressive than ARA; 

(b) the ARA(spread) class is more expressive than ARA. 

Proof. Let w = a (8> d be a data word. To prove (a), consider the property "Tiiere exists 
a datum d and a position i with d(i) = d, a{i) = a, and there is no position j < i 
with d{j) = d, a{j) = b." . This property can be easily expressed by ARA(guess). It 
suffices to guess the data value d and checks that we can reach an element {a,d), and 
that for every previous element {b,d') we have that d 7^ d' . We argue that this property 
cannot be expressed by the ARA model. Suppose ad absurdum that it is expressible. This 
means that its negation would also be expressible by ARA (since they are closed under 
complementation). The negation of this property states 

"For every data value d, if there is an element (a, d) 

in the word, then there is a previous element {b, d)." ^ ' 

With this kind of property one can code an accepting run of a Minsky machine, whose 
emptiness problem is undecidable. This would prove that ARA(guess) have an undecidable 
emptiness problem, which is in contradiction with the decidability proof that we will give 
in Section [3. 2[ Let us see how the reduction works. 

The emptiness problem for Minsky machine is known to be undecidable even with an 
alphabet consisting of one symbol, so we disregard the letters read by the automaton in 
the following description. Consider then a 2-counter alphabet-blind Minsky machine whose 
instructions are of the form {q,i,q') with £ S {inc, dec, ifzero} x {1,2} being the operation 
over the counters, and q, q' states from the automaton's set of states Q. A run on this 
automaton is a sequence of applications of transition rules, for example 

(gi,inci,g2) (g2,inc2,93) (93,inci,g2) (g2,deci,gi) (gi,deci,g2) (92, ifzeroi, gs) 
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Figure 3: A property not expressible in ARA. 

This run has an associated data word over the alphabet 

Q X {inc, dec, ifzero} x {1, 2} x Q, 

where the corresponding data value of each instruction is used to match increments with 
decrements, for example, 

(gi,inci,g2) (g'2,inc2,53)fe,inci,g2) (g2,deci,gi) (gi,deci,g2) fe, ifzeroi, gs). 
1 2 3 2 1 4 

Using the ARA model we can make sure that (i) all increments have different data 
values and all decrements have different data values; and (ii) for every ( ,inQ, ) element 
with data value d that occurs to the left of a ( , ifzeroj, ), there must be a ( , decj, ) element 
with data value d that occurs in between. |8] shows how to express these properties using 
ARA. However, properties (i) and (ii) are not enough to make sure that every prefix of the 
run ending in a ifzerOj instruction must have as many increments as decrements of counter 
i. Indeed, there could be more decrements than increments — but not the opposite, thanks 
to (ii). 

The missing condition to verify that the run is correct is: (iii) for every decrement there 



exists a previous increment with the same data value. In fact, we can see that property (PI ) 
can express condition (iii): we only need to change a by a decrement transition in the 
coding, and h by an increment transition of the same counter. But then, assuming that 



property (PI) can be expressed by ARA(guess), the emptiness problem for ARA(guess) is 
undecidable. This is absurd, as the emptiness problem is decidable, as we will show later 
on in Theorem 13.51 



Using a similar reasoning as before, we show (b): the ARA(spread) class is more ex- 
pressive than ARA. Consider the property: "Tiiere exists a position i labeled b such that 
d(i) / d(j) for all j < i with a(j) = a." as depicted in Figurejsj Let us see how this can be 
coded into ARA(spread). Assuming qq is the initial state, the transitions should reflect that 
every datum with label a seen along the run is saved with a state qa, and that this state is 
in charge of propagating this datum. Then, we guess a position labeled with b and check 
that all these stored values under qa are different from the current one. For succinctness we 
write the transitions as positive boolean combinations of the basic operations. 

S{qo) = {b Aspread{qa,qi)) V {{a V store{qa)) A >qo) , 
6{qi) = eq, 6{qa) = Oqa). 
This property cannot be expressed by the ARA model. Were it expressible, then its negation 
"for every element b there exists a previous one 
labeled a with the same data vaiue" ^ ' 



would also be. Just as before we can use property (P2) to express condition (iii), and force 
that for every decrement in a coding of a Minsky machine there exists a corresponding 
previous increment. This leads to a contradiction by proving that the emptiness problem 
for ARA(spread) is undecidable. □ 
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Corollary 3.3. ARA(guess), ARA(spread) and ARA(guess, spread) are not closed under com- 
plementation. 

Proof. If they were closed under complementation, then we could express some of the prop- 
erties described in the proof of Proposition |3.2[ resulting in an undecidable model, which is 
in contradiction with Theorem |3.5| D 



We then have the following properties of the automata model. 

Proposition 3.4 (Boolean operations). The class £(ARA(spread, guess)) has the following 
properties: 

(i) it is closed under union, 

(ii) it is closed under intersection, 

(iii) it is not closed under complementation. 

Proof sketch. Items ^ and ([n]) are straightforward if we notice that the first argument of 
spread ensures that this transition is always relative to the states of one of the automata 



being under intersection or union. Item (iii) follows from Corollary 3.3 D 



3.2. Emptiness problem. This section is dedicated to show the following theorem. 

Theorem 3.5. The emptiness problem for ARA{gue55, spread) is decidable. 

As already mentioned, decidability for ARA was proved in [8]. Here we propose an 
alternative approach that simplifies the proof of decidability of the two extensions spread 
and guess. 

The proof goes as follows. We will define a wqo over Cara and show that (Cara, ^) is 
rdc with respect to ^ (Lemma 3.10). Note that strictly speaking ^ is an infinite- branching 
transition system as — )•,> may take any value from the infinite set D, and — )-£ can also guess 
any value. However, it can trivially be restricted to an effective finitely branching one. 
Then, by Lemma [2^ (Cara,^) has a computable upward-closed reachability set, and this 
implies that the emptiness problem of ARA(guess, spread) is decidable. 

Since our model of automata only cares about equality or inequality of data values, it 
is convenient to work modulo renaming of data values. 

Definition 3.6 (~). We say that two configurations p,p' G Cara are equivalent (notation 
p ~ p') if there is a bijection / : data{p) — data{p') such that f{p) = p' , where f{p) stands 
for the replacement of every data value d by f{d) in p. 

Definition 3.7 (^). We first define the relation (Carai ^) such that 

(i,a,7,A) ^ (i',a',7'. A') 

iff a = q', 7 = y, and A C A'. 

Notice that by the definition above we are 'abstracting away' the information concerning 
the position i. We finally define ;^ to be ^ modulo ~. 

Definition 3.8 (;^). We define p ^ p' iff there is p" ~ p' with p ^ p'. 

The following lemma follows from the definitions. 

Lemma 3.9. (Cara);^) is a well-quasi-order. 
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Proof. The fact that ;^ is a quasi-order (i.e., reflexive and transitive) is immediate from its 
definition. To show that it is a weZZ-quasi-order, suppose we have an infinite sequence of 
configurations P1P2P3 • • • • It is easy to see that it contains an infinite subsequence ■ ■ ■ 

such that all its elements are of the form (i, oq, (aO) d),A) with 

• ao and oq fixed, and 

• A{d) = Co fixed. 

This is because we can see each of these elements as a finite coloring, and apply the pigeon- 
hole principle on the infinite set {pi}i- 

Consider then the function : p{Q) — )■ INq, such that gA{<S) = \{d \ S = A{d)}\ 
(we can think of as a tuple of (INo)^^*^-*). Assume the relation defined as A A' 
iff gA{S) < gA'iS) for all S. By Dickson's Lemma is a wqo, and then there are two 
Ti = {i',ao,iao,di),Ai), tj = {f , ao, iao,dj), Aj) , i < j such that Aj <^ Aj. For each 
5 C Q, there exists an injective mapping fs-{d\ Ai{d) = 5} — )• {d | Aj{d) = S} such 
that fs{di) = dj, as the latter set is bigger than the former by . We define the injection 
/ : data{Ti) — t- data{Tj) as the (disjoint) union of all /5's. The union is disjoint since for 
every data value d and set of threads A, there is a unique set S such that d £ gA{<S)- We 
then have that u ^ f{Ti) ^ tj. Hence, ^ tj. □ 

The core of this proof is centered in the following lemma. 

Lemma 3.10. (C ARAi * — ^) is rdc with respect to (CarA)^)- 

Proof. We shall show that for all p, r, p' G Cara such that p ^ t and p' p, there is r' 
such that t' ;^ r and either p' ^ r' or t' = p' . Since by definition of ^ we work modulo 
~, we can further assume that p' ^ p without any loss of generality. The proof is a simple 
case analysis of the definitions for All cases are treated alike, here we present the most 
representative. Suppose first that p — t-^ r, then one of the definition conditions of — )-£ must 
apply. 

If Eq. ( [3^ of the definition of -^^ (Fig. [2]) applies, let 

p = {i, a, (a, d), {{q, d)} U A) -^^ r = {i, a, {a, d),A) 

with 6{q) = eq. Let p' = {i',a, {a,d),A') ^ p. If {q,d) G A', we can then apply the same 
— )-£-transition obtaining p ^ p' — s-e t' ^ r. If there is no such {q,d), we can safely take 
p' = t' and check that t' ^ t. 
If Eq. (3.3) applies, let 

p = {i, a, {a, d),{{q, d')} U A) r = {i, a, (a, d), {{q' , d)} U A) 

with p — )>£ r and 6{q) = 5tore{q'). Again let p' ^ p containing {q,d') £ A'. In this case we 
can apply the same — )-£-transition arriving to r' where r' ^ r. Otherwise, if (q, d') ^ A', we 
take p' = t' and then t' <t. 

If a guess is performed (Eq. (3.9)), let 

p = {i, a, {a, d),{{q, d')} U A) r = {i, a, (a, d),{{q', e)} U A) 

with 6{q) = gue5s{q'). Let p' = {i' ,a,{a,d). A') ^ p. Suppose there is {q,d') G A', then 
we then take a guess transition from p' obtaining some r' by guessing e and hence r' ^ r. 
Otherwise, if {q, d') A', we take r' = p' and check that r' ^ r. 
Finally, if a spread is performed (Eq. (3.10)), let 

p= (i,a,7,{(g,d')}UA) ^eT={i,a,-i,{{qi,d) \ {q2,d) G A} U A) 
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with 5{q) = spread (^2 5 9i)- Let p' = {i' ,a,^, A') ^ p and suppose there is {q,d') G A' 
(otherwise r' = p' works). We then take a spread instruction p' — )-£ r' and see that r' ^ r, 
because any {qi,e) in r' generated by the spread must come from {q2,e) of p' , and hence 
there is some {q2, e) in p; now by the spread apphed on p, {qi, d!) is in r. 
The remaining cases of — )-£ are only easier. 

Finally, there can be a 'moving' application of Suppose that we have 

p= (a, d), A) -^j^ r = (i + 1, qi, (ai, di), Ai). 

Let p' = {i' , >, (a, d), A') ^ p. If p' is such that p' ^ r, the relation is trivially compatible. 
Otherwise, we shall prove that there is r' such that p' ^ t' and t' ^ t. Condition [i] of 
— (i.e., that all states are moving) holds for p' , because all the states present in p' are 
also in p (by definition of ^) where the condition must hold. Then, we can apply the — >•[> 
transition to p' and obtain r' of the form (i' + 1, qi, (ai, di), A'^). Notice that we are taking 
ai, ai and di exactly as in r, and that A'^^ is completely determined by the — )•,> transition 
from A'. We only need to check that r' ^ r. Take any {q,d') G A'^. There must be some 
{q',d') e A' with 6{q') = Oq. Since A' C A, we also have {q,d) G Ai. Hence, A'^ C Ai and 
then r' ^ r. □ 

We just showed that (Carm^) is rdc with respect to {C^ra,^)- The only missing 
ingredient to have decidability is the following, which is trivial. 

Lemma 3.11. The set of accepting configurations o/Cara is downward closed with respect 

We write Cara/~ to the set of configurations modulo ~, by keeping one representative 
for every equivalence class. Note that the transition system (Cara/~, ^) is effective. This is 
just a consequence of the fact that the ^-image of any configuration has only a finite number 
of configurations modulo ~, and representatives for every class are computable. Hence, 



we have that (Cara/^,^,^) verify conditions (1) and (2) from Proposition 2.7 Finally, 



condition (3) holds since (Cara, ^) is computable. We can then apply Lemma 2.8 obtaining 
that for a given G ARA(guess, spread), testing wether there exists a final configuration r 
and an element p in 

{(1, a, (a, do),{{qi, do)}) | a G {>, >}, a G A} 
— for any fixed do — such that p ^ p' ^* r (for some p') is decidable. Thus, we can decide 



the emptiness problem and Theorem 3.5 follows. 



3.3. Ordered data. We show here that the previous decidability result holds even if we 
add order to the data domain. Let (D, <) be a linear order, like for example the reals or 
the natural numbers with the standard ordering. Let us replace the instructions eq and eq 
with 

5(g) := ... I test(>) I test(<) | test(=) | test(7^) 

and let us call this model of automata ARA(guess, spread, <). The semantics is as expected. 
test(<) verifies that the data value of the current position is less than the data value in the 
register, test(>) that is greater, and test(=) (resp. test(7^)) that both are (resp. are not) 
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equal. We modify accordingly — for p = {i, a, (a, d), {{q, d')} U A). 

p -i-e {i,a,{a,d),A) if 6{q) = test{<) and d < d' (3.13) 

p —j-e {i,a,{a,d), A) if 6{q) = test{>) and d > d' (3.14) 

/) (a,d), A) if 6{q) = test{=) and d = d' (3.15) 

p^s {ha,{a,d),A) if 5{q) = test{^) and d ^ d' (3.16) 

All the remaining definitions are preserved. We can show that the emptiness problem for 
this extended model of automata is still decidable. 

Theorem 3.12. The emptiness problem for ARA(gues5, spread, <) is decidable. 



As in the proof in the previous Section [3^ we show that there is a wqo « C Cara x Cara 
that is rdc with respect to such that the set of final states is ^-downward closed. 
However, we need to be more careful when showing that we can always work modulo an 
equivalence relation. 

Definition 3.13. A function / is an order-preserving bijection on D C D iff it is a 

bijection on D, and furthermore for every {d,d'} C D, if d < d' then f{d) < f{d'). 

The following Lemma is straightforward from the definition just seen. 

Lemma 3.14. Let I? C D, \D\ < oo. There exists an order-preserving bijection f on D 
such that 

• for every {d, d'} C D such that d < d' there exists d such that f{d) < d < f{d'), 

• for every d & D there exists d such that f[d) < d, and there exists d such that d < f{d). 

Definition 3.15 (~ord)- Let p,p' be two configurations. We define p ^ord p' iff f{p) = p' 
for some order-preserving bijection / on data{p). 

Remark 3.16. If p ^ p' then there exists d G D such that {d} U data{p) C data{p'). This is 
a simple consequence of the definition of 

Let us define a version of ^ that works modulo ~ord) and let us call it ^ord- 

Definition 3.17. Let pi,p2 be two configurations. We define p\ ^ord Pi iff p'\ ^ (^2 fo^ 

some p\ r^ord Pi and p'^ ^ord P2- 

In the previous section, when we had that ~ was simply a bijection and we could not 
test any linear order <, it was clear that we could work modulo ~. However, here we are 
working modulo a more complex relation ^ord- In the next lemma we show that working 
with ^ or working with ^ord is equivalent. 

Lemma 3.18. // pi ^ord ■ ■ ■ ^ord Pn, then p[ ^ ■ ■ ■ ^ p'^, with p[ ^ord Pi for every i. 

Proof. The case n = 1 is trivial. Otherwise, if n > 1, we have p\ > — ^ord * * * — ^ord Pn—1 ^ — ^ ord 
Pn- Then, by inductive hypothesis, we obtain p[ ^ • • • ^ p'n-i and p'^-i ^ Pn with 
p'i ^ord Pi for every i G {1, . . . , n — 1}, and p'- ^ord Pj for every j £ {n— 1, n}. Let g be the 
witnessing bijection such that g{p'^_i) = p'n-i, and let us assume that {d} U data{p'^_i) C 



data{p'^) by Remark 3.16 



Let / be an order-preserving bijection on IJi<n-i data{p'^) as in Lemma 
then pick a data value d such that 

• for every d > d with d £ data{p'l^_^) , f{g{d)) > d, and 



3.14 



We can 
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• for every d < d with d £ data{p'^_i), f{g{d)) < d. 
Let h := (g o f)[d i— d]. We then have 

. ^ hip':,), 

• /w)-----/(p;-i), 

• f{p'n-l) = Kp'U)- 

In other words, f{p']) w • • • ^ f{p'n~i) ^ ^(Pn)> with p„ and f{p[) r^ord Pi for 

every i < n — 1. O 

Now that we proved that we can work modulo ~or(i; we show that we can decide if 
we can reach an accepting configuration by means of ^ord, by introducing some suitable 
ordering « and showing the following lemmas. 

Lemma 3.19. (Cara, ^) is a well- quasi- order. 

Lemma 3.20. (Cara) ^) is rdc with respect to ^ord- 

Lemma 3.21. The set of accepting configurations of Cara is downward closed with respect 
to «. 

We next define the order « and show that the aforementioned lemmas are valid. In 
the same spirit as before, <SC is defined as ^ modulo ~ord- 

Definition 3.22 («). pi «. P2 iff Pi ^ P2 foi" some p'^ ^ord Pi and p2 ^ord P2- 



To prove Lemma 3.19, given a configuration p = (zq, o, (a, c?). A), with data{p) = {di < 
■ ■ ■ < dn} we define 

abs{di) = A{dj) U {★ | = d} C g u {★} 
abs{p) = abs{di), . . . , a6s(d„) G (p(Q U {*}))* 
where ★ Q is to denote that the data value is the one of the current configuration. 



Proof of Lemma 3.19. This is a consequence of Higman's Lemma| stated as in Corollary |2.3 



As stated above, we can see each configuration p = {i, a, (a, d), A) as a word over {p{Q U 



{*}))*. As shown in Lemma 3.9 if there is an infinite sequence, there is an infinite sub- 



sequence pi,p2,..., with the same type a and letter a. Then for the infinite sequence 



abs{pi), abs{p2), . • • , Corollary 2.3 tells us that there are i < j such that abs{pi) is a sub- 
string of abs{pj). This implies that they are in the « relation. O 

Proof of Lemma \3.2C\ Note that although <3C is a more restricted wqo, for all the non- 
moving cases in which the register is not modified (that is, all except guess, spread, and store), 
the — )>£ transition continues to be rdc. This is because for any r « p — )-£ p\ p = (i, a, 7, A) 
and p' = (z', a', 7', A') are similar in the following sense. Firstly data{p) = data{p'), and 
moreover the only difference between p and p' is that A' is the result of removing some 
thread {q,d) from A and inserting another one {q',d) with the same data value d. This 
kind of operation is compatible, since r can perform the same operation r — )>£ r' on the 
data value d' , supposing that d' is the preimage of d given by the « ordering. In this case, 
t' « p' . Otherwise, if there is no preimage of d, then t p' . The compatibility of spread 
is shown equivalently. 

Regarding the store instruction, we see that the operation consists in removing some 
{q,d) from A and inserting some {q',do) with do the datum of the current configuration. 
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This is downwards compatible since r can perform the same operation on the configuration's 
data value, which is necessarily the preimage of cIq. 

For the remaining two cases (guess and o) we rely on the premise that we work modulo 
^ord- The idea is that we can always assume that we have enough data values to choose from 
in between the existing ones. That is, for every pair of data values d < d' in a configuration, 
there is always one in between. We can always assume this since otherwise we can apply a 
bijection as the one described by Lemma 3.14| to obtain this property. Thus, at each point 
where we need to guess a data value (as a consequence of a guess(g) or a \>q instruction) we 
will have no problem in performing a symmetric action, preserving the embedding relation. 

More concretely, suppose the execution of a transition S{q) = gues5{q') on a thread 
{q,dj) of configuration p with data{p) = {di < ••• < d„} guesses a data value d with 
di < d < di^i. Then, for any configuration t <^ p with data^r) = {ei < ■ ■ ■ < Cm} and the 
property just described, there must be an order-preserving injection / : data{T) — )• data{p) 
with /(r) < p. If T contains a thread (g, Cji) with f{ej') = dj the operation is simulated by 
guessing a data value e such that e > eg for all ei such that /(e^) < di and e < for all ek 
such that /(cfc) > di. Such data value e exists as explained before. The rdc compatibility 
of a 6{q) = [>q' instruction is shown in an analogous fashion. □ 



Proof of Lemma 3.21. Given that « is a subset of and that by Lemma 3.11 the set of 



accepting configurations is ;^-downward closed, it follows that this set is also ^-downward 
closed. □ 



Finally, we should note that (CARA/~or(i, 
As in the proof of Section 3.2 , by Lemmas 3.19 
of Proposition 2.7 are met and by Lemma 2.8 
ARA(guess, spread, <) is decidable. 



ord) is also finitely branching and effective. 
3.20| and 3.21 we have that all the conditions 



we conclude that the emptiness problem for 



Remark 3.23. Notice that this proof works independently of the particular ordering of 
(D,<). It could be dense or discrete, contain accumulation points, be open or closed, etc. 
In some sense, this automata model is blind to these kind of properties. If there is an 
accepting run on (D, <) then there is an accepting run on (D, <') for any linear order <'. 

Open question 3.24. It is perhaps possible that these results can be extended to prove 
decidability when (D, <) is a tree-like partial order, this time making use of KruskaVs tree 
theorem [25 1 instead of Higman's Lemma We leave this issue as an open question. 



Remark 3.25 (constants). One can also extend this model with a finite number of constants 
{ci, . . . , c„} C D. In this case, we extend the transitions with the possibility of testing that 
the data value stored in the register is (or is not) equal to q, for every i. In the proof, it 
suffices to modify ~ord to take into account every constant Cj. In this case we define that 
P ^ord T iff f{p) = p' for some order-preserving bijection / on data{p) U {ci, . . . , c„} such 
that /(cj) = Ci for every i. In this case Lemma 3.14 does not hold anymore, as there could be 



finitely many elements in between two constants. This is however an easily surmountable 



obstacle, by adapting Lemma 3.14 to work separately on the n + 1 intervals defined by 
Suppose ci < ■ ■ ■ < Cn- Without any loss of generahty we can assume that 
between a and Cj+i there are infinitely many data values, or none (we can always add some 
constants to ensure this). Then, for every infinite interval [ci,Ci+i], we will have a lemma 



like Lemma 3.14 that we can apply separately. 
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3.4. Timed automata. Our investigation on register automata also yields new results on 
the class of timed automata. An alternating 1-clock timed automaton is an automaton that 
runs over timed words. A timed word is a finite sequence of events. Each event carries 
a symbol from a finite alphabet and a timestamp indicating the quantity of time elapsed 
from the first event of the word. A timed word can hence be seen as a data word over the 
rational numbers, whose data values are strictly increasing. The automaton has alternating 
control and contains one clock to measure the lapse of time between two events (that is, 
the difference between the data of two positions of the data word). It can reset the clock, 
or test whether the clock contains a number equal, less or greater than a constant, from 
some finite set of constants. For more details on this automaton we refer the reader to [3]. 

Register automata over ordered domains have a strong connection with timed automata. 
The work in [16j shows that the problems of nonemptiness, language inclusion, language 
equivalence and universality are equivalent — modulo an ExpTime reduction — for timed au- 
tomata and register automata over a linear order. That is, any of these problems for register 
automata can be reduced to the same problem on timed automata, preserving the number 
of registers equal to the number of clocks, and the mode of computation (nondeterministic, 
alternating). And in turn, any of these problems for timed automata can also be reduced 
to a similar problem on register automata over a linear order. We argue that this is also 
true when the automata are equipped with guess and spread. 

Consider an extension of 1-clock alternating timed automata, with spread and guess, 
where 

• the operator spread (q, q') works in the same way as for register automata, duplicating all 
threads with state q as threads with state q', and 

• the guess((;) operator resets the clock to any value, non deterministically chosen, and 
continues the execution with state q. 

The coding technique of [T6j can be adapted to deal with the guessing of a clock (the spread 
operator being trivially compatible), and one can show the following statement. 

Remark 3.26. The emptiness problem for alternating 1-clock timed automata extended with 
guess and spread reduces to the emptiness problem for the class ARA(guess, spread, <). 



Hence, by Remark 3.26 cum Theorem |3 . 1 2| we obtain the following. 



Remark 3.27. The emptiness problem for alternating 1-clock timed automata extended with 
guess and spread is decidable. 



3.5. A note on complexity. Although the ARA(guess, spread) and ARA(guess, spread, <) 

classes have both non-primitive-recursive complexity, we must remark that the decision pro- 
cedure for the latter has much higher complexity. While the former can be roughly bounded 
by the Ackermann function applied to the number of states, the complexity of the decision 
procedure we give for ARA(guess, spread, <) majorizes every multiply-recursive function (in 



particular, Ackermann's) . In some sense this is a consequence of relying on Higman's Lemma 



instead of |Dickson's Lemma for the termination arguments of our algorithm. 

More precisely, it can be seen that the emptiness problem for ARA(guess, spread, <) 
sits in the class ^^j^^ in the Fast Growing Hierarchy [26j — an extension of the Grzegorczyk 
Hierarchy for non-primitive-recursive functions — by a reduction to the emptiness problem 
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for timed one clock automata (see ^3.4), which are known to be in this class However, the 
emptiness problem for ARA(guess, spread) belongs to JJoj in the hierarchy. The lower bound 
follows by a reduction from Incrementing Counter Automata |8], which are hard for 
\28\ I15|. The upper bound is a consequence of using a saturation algorithm with a wqo that 
is the component-wise order of the coordinates of a vector of natural numbers in a controlled 
way. The proof that it belongs to goes similarly as for Incrementing Counter Automata 
(see |15l §7.2]). We do not know whether the emptiness problem for ARA(guess, spread, <) 
is also in 5t^. 



4. LTL WITH REGISTERS 

The logic LTL^(U,X) is a logic for data words that corresponds to the extension of the 
Linear-time Temporal Logic LTL(U,X) on data words, with the ability to use one register 
for storing a data value for later comparisons, studied in [H1|U]. It contains the next (X) 
and until (U) temporal operators to navigate the data word, and two extra operators. The 
freeze operator ^ ip permits to store the current datum in the register and continue the 
evaluation of the formula (p. The operator f tests whether the current data value is equal 
to the one stored in the register. 

As it was shown in [8], if we allow more than one register to store data values, satis- 
fiability of LTL~''(U,X) becomes undecidable. We will then focus on the language that uses 
only one register. We study an extension of this language with a restricted form of quan- 
tification over data values. We will actually add two sorts of quantification. On the one 
hand, the operators V< and 3^ quantify universally or existentially over all the data values 

occurring before the current point of evaluation. Similarly, V> and 3^, quantify over the 
future elements on the data word. For our convenience and without any loss of generality, 
we will work in Negation Normal Form (nnf), and we use U to denote the dual operator of 
U, and similarly for X. Sentences of LTL;';^f(U, U,X,X, O), where O C {V^-, 3^, V^, 3^} are 
defined as follows, 

if ::= a | -la | t I t I i I | Xip \ U((/?,(/?) | \J{ip,ip) \ op f \ (pAip \ fVip 

where a is a symbol from a finite alphabet A, and op G O. For economy of space we write 
LTL^^f (5^, O) to denote this logic. In this notation, 5 is to mark that we have all the forward 
modalities: U,U,X,X. Notice that the future modality can be defined fip := U((^, T) and 
its dual Gif as the nnf of -iF-i(/5. 

Figure [4] shows the definition of the satisfaction relation \=. For example, in this logic 
we can express properties like "for every a element there is a future b element with the same 
data value" as G(-ia V | {f{b A t)))- We say that f satisfies vi^ = a d, written w |= if, if 
w,l ^'IW if. 



4.1. Satisfiability problem. This section is dedicated to the satisfiability problem for 
LTLjjj^f(5', V<, 3i,). But first let us show that 3^ and V> result in an undecidable logic. 

Theorem 4.1. Lei 3^ be the operator 3^^ restricted only to the data values occurring strictly 
before the current point of evaluation. Then, on finite data words: 

■^The emptiness problem for timed one clock automata can be at the same time reduced to that of Lossy 
Channel Machines [2], which are known to be 'complete' for this class, i.e., in \ 5<i^'^ (see |6]). 
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(^w, i) \='^ U(99, V') iff for some i < j E pos(w) and for all i < k < j 

we have (w, j) ip and (w, k) 1=^^ ■0 
(w, i) \='^ Xip iff i + l £ pos(w) and (w, i + 1) 1=^^ ip 
(w,i) 1=^^ iff there exists i < j € pos(w) such that (w,i) |='^(''-' ip 

(w, i) V^(/7 iff for all 1 < j < i we have (w, i) 

Figure 4: Semantics of LTL^(U, 0, X, X, 3^^, V^) for a data word w = a (8) d and i G pos(w). 

(1) satisfiability o/ LTL^^(:(F, G, 3^) is undecidable; and 

(2) satisfiability o/ LTL^^j:(F, G, V>) is undecidable. 

Proof. We prove ([T]) and ([2]) by reduction of the halting problem for Minsky machines. 
We show that these logics can code an accepting run of a 2-counter Minsky machine as in 



Proposition 3.2 Indeed, we show that the same kind of properties are expressible in this 
logic. To prove this, we build upon some previous results [T7] showing that LTL^j^f(F,G) 
can code conditions (i) and (ii) of the proof of Proposition |3.2[ Here we show that both 

II II 

LTL^^f (F, G, 3^) and LTL^^^f (F, G, V>) can express condition (iii), ensuring that for every 
decrement (decj) there is a previous increment (incj) with the same data value. Let us see 
how to code this. 

The LTLj;^f(F,G,3t) formula 

G(deQ ^ 3^ t) 

states that the data value of every decrement must not be new, and in the context of 
this coding this means that it must have been introduced by an increment instruction. 
((2j) The LTLi^f(F,G,vt) formula 

vt(F(deQAt) ^ F(inCiAt)) 

evaluated at the first element of the data word expresses that for every data value: 
if there is a decrement with value d, then there is an increment with value d. It is 
then easy to ensure that they appear in the correct order (first the increment, then the 
decrement). 

The addition of any of these conditions to the coding of [17] results in a coding of an n- 
counter Minsky machine, whose emptiness problem is undecidable. D 

Corollary 4.2. The satisfiability problem for both LTL^^^(5^, 3^) and LTL^j^f(5^, V>) are 

undecidable. 



Proof. The property of item ([T]) in the proof of Theorem 4.1 can be equally coded in 



LTLj;^f(5,3t) as G(X(deQ) ^ 3{{X t)). The undecidability of LTLi^f(5,V^) follows di- 
rectly from Theorem 4.1, item □ 
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We now turn to our decidability result. We show that LTL^^^ (5, V<, 3>) has a decidable 
satisfiability problem by a translation to ARA(guess, spread). 

The translation to code LTL^^(r(5^, V<, 3>) into ARA(guess, spread) is standard, and fol- 
lows same lines as [sj^ (which at the same time follows the translation from LTL~''(U, X) to 
alternating finite automata). We then obtain the following result. 

Proposition 4.3. ARA(guess, spread) captures LTLj;^f(5, 3^, V<). 



From Proposition 4.3 and Theorem |3.5| it will follow the main result, stated next. 

Theorem 4.4. The satisfiability problem for LTL^j^f(J, 3>, V<) is decidable. 

We now show how to make the translation to ARA(guess, spread) in order to obtain our 
decidability result. 



Proof of Proposition 4-3. Let ?? G LTL^j^f(5^, V<, 3>). We show that for every formula rj G 

LTL^j^f(^, Vi;, 3^) there exists a computable ARA(spread, guess) such that for every data 
word w, 

w satisfies r/ iff J^^ accepts w. 

In the construction of J^, we first make sure to maintain all the data values seen so 
far as threads of the configuration. We can do this by having 3 special states gi, q2,qsave in 
J^, defining qi as the initial state, and 5 as follows. 

6{qi) = StOre(g2) A q.^ 6{q2) = (>? V >qi) A qsave Hlsave) = >? V >qsave 

Now we can assume that at any point of the run, we maintain the data values of all 
the previous elements of the data word as threads {qsave, d). Note that these threads are 
maintained until the last element of the data word, at which point the test >? is satisfied 
and they are accepted. 

Now we show how to define J^rj- We proceed by induction on If ry = a or -la, we 
simply define the set of states as Q = {qi,q2,qsa.ve,qrj} and 5{qrj) = a or S{qri) = a {6{qi), 
6{q2) and 5{qsave) are defined as above). If r; = |V (or r/ = t, = -if) we define it as 
follows. We add one new state qri to the set of states of J^, we extend the definition of 
6 with 5{qr^) = store((/^) (or 6{qri) = eq(g^), 5{qrj) = eq{q^)), and we redefine S{qi) with 
6{qi) = store((72) A qr^. li rj = Ftp, rj = GV', or = U(V','i/'')) it is easy to define 5{qri) 
from the definition of J?^ and J?^/, perhaps adding some new states. On the other hand, if 
(p = ip /\ ip' ox if = ip y ip\ \t \s also straightforward as it corresponds to the alternation and 
nondeterminism of the automaton. 

Suppose now that rj = V^V- We define J?^ as J?^ with the extra state qr^ and we 
define 5{q^) = spread ((/save , Q'V') A This ensures that at the moment of execution of 

this instruction, all the previous data values in the data word will be taken into account, 
including the current one. 

Finally, if = B^^/^, we build from ^ in the same fashion as before, adding new 
states qri, q'^, q'^, defining 5{qr,) = guess(g^), 5(g^) = q-ip f\ q'^, S{q'^) = eq V \>q'^. Note that q'^ 
checks that the guessed data value appears somewhere in a future position of the word. □ 

^Note that this logic already contains LTL"^(U,X). 
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(w,i)|='^t> iff d>d{i) 
(w,i)|='^t< iff d<d{i) 

Figure 5: Semantics for the operators t>) t< for a data word w = a (gi d and i G pos(w). 

Moreover, we argue that these extensions add expressive power. 

Proposition 4.5. On finite data words: 

(i) The logic LTL^^^^ (5, V<) is more expressive than LTL^^f(5'); 

(ii) The logic UL^^f{^,3y ) is more expressive than LTL^^^f (^J). 

Proof. This is a consequence of LTL^^(r(3^) being closed under negation and Theorem 4.1 

Ad absurdum, if one of these logics were as expressive as LTL'Lf(5^), then it would be closec. 
under negation, and then we could express conditions ([T]) or Q of the proof of Theorem 4.1 



and hence obtain that LTL^^j:(5') is undecidable. But this leads to a contradiction since by 



Theorem 



4.4 



'-TLi„f(5') is decidable. □ 



Remark 4.6. The translation of Proposition 4.3 is far from using all the expressive power 



of spread. In fact, we can consider a binary operator V<((^,V') defined 

w,i V<((/9, "0) iff for all j < i such that w, j |='^''-''' i/j, we have w,i |='^*^-''' ip. 
with -ip G LTL^j^f(5)- This operator can be coded into ARA(guess, spread), using the same 



technique as in Proposition 4.3 The only difference is that instead of 'saving' every data 
value in gsave; we use several states qsave{ip)- Intuitively, only the data values that verify the 
test I "0 are stored in gsave{?/i)- Then, a formula V<((/9, -0) is translated as spread (gsave(V')' ^^)- 



we can 



4.2. Ordered data. If we consider a linear order over B as done in Section 3.3 
consider LTL^J^f (5, 3>, V<) with richer tests 

ip::= t> I t< I ••• 

that access to the linear order and compare the data values for =,<,>. The semantics are 
extended accordingly, as in Figure 5 Let us call this logic ord- LTL^^^^ (5^, 3>,V<). The 

translation from ore?- LTL^^^f (3^, 3>,V<) to ARA(guess, spread, <) as defined in Section 
straightforward. Thus we obtain the following result. 

Proposition 4.7. The satisfiability problem for ord-UL^^f{^,3y,y^) is decidable. 



3.3 



IS 



Part 2. Data trees 

The second part of this work deals with logics and automata for data trees. In Section [5j 
we extend the model ARA(guess, spread) to the new model ATRA(guess, spread) that runs 
over data trees (instead of data words). The decidability of the emptiness follows easily 
from the decidability result shown for ARA(guess, spread). As in the case of data trees, this 
model allows to show the decidability of a logic. 
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In Section [6] we introduce 'forward XPath', a logic for xml documents. The satis- 
fiability problem for this logic will follow by a reduction to the emptiness problem of 
ATRA(guess, spread). This reduction is not as easy as that of the first part, since XPath 
is closed under negation and our automata model is not closed under complementation. 
Indeed, ATRA(guess, spread) and forward XPath have incomparable expressive power. 



5. ATRA MODEL 

Herein, we introduce the class of Alternating Tree Register Automata by slightly adapting 
the definition for alternating (word) register automata. This model is essentially the same 
automaton presented in Part [T| that works on a (unranked, ordered) data tree instead of 
a data word. The only difference is that instead of having one instruction > that means 
'move to the next position', we have two instructions > and V meaning 'move to the next 
sibling to the right' and 'move to the leftmost child'. This class of automata is known as 
ATRA(guess, spread). This model of computation will enable us to show decidability of a 
large fragment of XPath. 

An Alternating Tree Register Automaton (ATRA) consists in a top-down tree 
walking automaton with alternating control and one register to store and test data. [24j 
shows that its emptiness problem is decidable and non-primitive-recursive. Here, as in the 
Part [l} we consider an extension with the operators spread and guess. We call this model 
ATRA(spread, guess). 

Definition 5.1. An alternating tree register automaton of ATRA(spread, guess) is a tuple 
= (A, Q, qi, 6) such that A is a finite alphabet; Q is a finite set of states; qj £ Q is the 
initial state; and 5 : Q ^ ^ is the transition function, where $ is defined by the grammar 

a I a I 0? I store(g) | eq | eq | qAq \ qVq \ Vg | >q \ guess(g) | 5pread{q,q') 

where a e A,q,q' e Q, Q e {V, V, >, >}. 

We only focus on the differences with respect to the ARA class. V and > are to move 
to the leftmost child or to the next sibling to the right of the current position, and as before 
'0?' tests the current type of the position of the tree. For example, using V? we test that 
we are in a leaf node, and by >? that the node has a sibling to its right. store{q), eq and 
eq work in the same way as in the ARA model. We say that a state q & Q is moving if 
6{q) = \>q' or 6{q) = Vg' for some q' £ Q. 

We define two sorts of configurations: node configurations and tree configurations. In 
this context a node configuration is a tuple {x, a, 7, A) that describes the partial state 
of the execution at a position x of the tree, x £ pos(t) is the current position in the tree 
t, 7 = t(x) G A X D is the current node's symbol and datum, and a = type^{x) is the tree 
type of X. As before, A G p<oo(<5 x D) is a finite collection of execution threads. A^atra 
is the set of all node configurations. A tree configuration is just a finite set of node 
configurations, like {(e, a, 7, A), (1211, a', 7', A'), . . . }. We call Tatra = P<oo(A/'atra) the set 
of all tree configurations. 



We define the non-moving relation — t-^ over node configurations just as in page 10 , As 
a difference with the ARA mode, we have two types of moving relations. The first-child 
relation — t-v, to move to the leftmost child, and the next-sibling relation — t-^ to move to the 
next sibling to the right. 
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The — )-v and — )•[> are defined, for any ai £ {V,V,I>,>}, 7,71 £ A x B, h £ {|>,|>}, 
V G {V, V}, as folfows 

{x, (v,/i),7, A) (x-l,ai,7i, Av), (5.1) 

{x-i,{v,>),-f,A) -^^ + l),ai,7i, A^) (5.2) 

iff (i) the configuration is 'moving' (i.e., all the threads {q, d) contained in A are of the form 
5{q) = Vq' or 6{q) = [>q'); and (ii) for E {V,>}, A© = {(q'^d) \ {q,d) £ A,6{q) =Qq'}. 

Let — )• := — )-£ U — 7>v U — C AAatra x A/Itra- Note that through — )• we obtain a run 
over a branch of the tree (if we think about the underlying binary tree according to the 
first-child and next-sibling relations). In order to maintain all information about the run 
over all branches we need to lift this relation to tree configurations. We define the transition 
between tree configurations that we write This corresponds to applying a 'non- moving' 
— )-£ to a node configuration, or to apply a 'moving' — )>v, — or both to a node configuration 
according to its type. That is, we define Si S2 iff one of the following conditions holds: 
{1) Si = {p}US', 52 = {r}uy, p^er; 

(2) 5i = {p}uy, 52 = {r}uy, p= (x,(V,>),7,A), p r; 

(S) Si = {p}US', 52 = {r}uy, p=(x,(V,>),7,A), p r; 

(4) 5i = {p}uy, S2 = {n,T2}US', /9= (j;, (V,>),7,A), p^vn, p^^T2. 

A run over a data tree t = a 0) d is a nonempty sequence Si -» • • • -» 5n with 
Si = {(e, 70) ^0)} and Aq = {{qi,d{e))} (i.e., the thread consisting in the initial state 
with the root's datum), such that for every i £ [n] and (x,a,7, A) £ Si- (1) x £ pos(t); (2) 
7 = t(x); and (3) a = type^{x). As before, we say that the run is accepting if 

Sn C {(x,a,7,0) I (x,a,7,0) £ X\tra}- 

Note that the transition relation that defines the run replaces a node configuration in 
a given position by one or two node configurations in children positions of the fens coding. 
Therefor the following holds. 

Remark 5.2. Every run is such that any of its tree configurations never contains two node 
configurations in a descendant/ancestor relation of the fens coding. 



The ATRA model is closed under all boolean operations [2l]. However, the extensions 
introduced guess and spread, while adding expressive power, are not closed under comple- 
mentation as a trade-off for decidability. It is not surprising that the same properties as for 
the case of data words apply here. 

Proposition 5.3. ATRA(spread, guess) models have the following properties: 

(i) they are closed under union, 

(ii) they are closed under intersection, 

(iii) they are not closed under complementation. 

Example 5.4. We show an example of the expressiveness that guess adds to ATRA. Al- 



though as a corollary of Proposition 3.2 we have that the ATRA(guess, spread) class is more 
expressive than ATRA, we give an example that inherently uses the tree structure of the 
model. We force that the node at position 2 and the node at position 1-1 of a data tree 
to have the same data value without any further data constraints. Note that this datum 
does not necessarily has to appear at some common ancestor of these nodes. Consider the 
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Figure 6: Two indistinguishable data trees for ATRA. 

ATRA(guess) defined over A = {a} with 

5(go) = guess(gi), ^(gi) = Vg2, S{q2) = q3 ^ Qi, 

^{Q3) = Vgs, ^{qA) = >q5, S{q5) = eq. 

For the data trees of Figure [6j any ATRA either accepts both, or rejects both. This is 
because when a thread is at position 1, and performs a moving operation splitting into two 
threads, one at position 1-1, the other at position 2, none of these configurations contain 
the data value 2. Otherwise, the automaton should have read the data value 2, which is not 
the case. But then, we see that the continuation of the run from the node configuration at 
position 2 and at position 1 are isomorphic independently of which data values we choose 
(either the data 2 and 3, or the data 2 and 2). Hence, both trees are accepted or rejected. 
However, the ATRA(guess) we just built distinguishes them, as it can introduce the data 
value 2 in the configuration, without the need of reading it from the tree. Equivalently, the 
property of "there are two leaves with the same data values" is expressible in ARA(guess) 
and not in ARA. 




5.1. Emptiness problem. We show that the emptiness problem for this model is decid- 
able, reusing the results of Part [TJ We remind the reader that the decidability of the 
emptiness of ATRA was proved in [23]. Here we extend the approach used for ARA and 
show the decidability of the two extensions spread and guess. 

Theorem 5.5. The emptiness problem o/ ATRA(guess, spread) is decidable. 



Proof. The proof goes as follows. We will reuse the wqo ^ used in Section 3.2, which here 
is defined over the node configurations. The only difference being that to use ;^ over A/Itra 
we work with tree types instead of word types. Since and — >-v are analogous, by the 
same proof as in Lemma |3.10 we obtain the following. 



Lemma 5.6. (A/atraj— ^) is rdc with respect to (Mj 



ATRA) TS) 



We now lift this result to tree configurations. We instantiate Proposition 2.11 by taking 
— )•! as — )•, < as and taking <p the majoring order over (AAatra, We take — )-2 to be -» 
as it verifies the hypothesis demanded in the Lemma. As a result we obtain the following. 

Lemma 5.7. (71tra,-») is rdc with respect to (71tra,<p)- 



Hence, condition (1) of Proposition 2.7 is met. Let us write = for the equivalence 
relation over Tatra such that S = S' iff S <p S' and S' <p S. Similarly as for Part [l] 
we have that (7atra/=,^) is finitely branching and effective. That is, the ^-image of 
any configuration has only a finite number of configurations up to isomorphism of the data 
values contained (remember that only equality between data values matters), and repre- 
sentatives for every class are computable. Hence, we have that (71tra/=! <p) verifies 
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condition (2) of Proposition 2.7 Finally, condition (3) holds as (71tra/=i <p) is a wqo (by 



Proposition 2.10) that is a computable relation. We conclude the proof by the following 
obvious statement. 

Lemma 5.8. The set of accepting tree configurations is downwards closed with respect to 



Hence, by Lemma 2.8, we conclude as before that the emptiness problem for the class 
ATRA(guess, spread) is decidable. □ 



6. Forward-XPath 

We consider a navigational fragment of XPath 1.0 with data equality and inequality. In 
particular this logic is here defined over data trees. However, an XML document may typically 
have not one data value per node, but a set of attributes, each carrying a data value. This 
is not a problem since every attribute of an xml element can be encoded as a child node 



in a data tree labeled by the attribute's name (c/. Section 6.3). Thus, all the decidability 
results hold also for XPath with attributes over xml documents. 

Let us define a simplified syntax for this logic. XPath is a two-sorted language, with path 
expressions (a,/3, ...) and node expressions (y?, '0, . . . ). We write XPath(C',=) to denote 
the data-aware fragment with the set of axes O C {|, —)•,—)•* f*}. It is defined 
by mutual recursion as follows, 

0!,/3 ::= o I [v?] I a/3 I aU/3 o G O U {e} 

(p,il) ::= a \ -193 | (/? V V' | A '0 | (a) \ {01 = fi) \ {a ^ j3) a£A 

where A is a finite alphabet. A formula of XPath(C', =) is either a node expression or a path 
expression. We define the 'forward' set of axes as ^ ■= {J,, 4*, — — ?"*}, and consequently the 
fragment 'forward-XPath' as XPath(5', =)• We also refer by XPath^(5', =) to the fragment 
considered in [21] where data tests are of the restricted form {e = a) or (e 7^ a) 

There have been efforts to extend XPath to have the full expressivity of MSO, e.g. 
by adding a least fix-point operator (c/. [29l Sect. 4.2]), but these logics generally lack 
clarity and simplicity. However, a form of recursion can be added by means of the Kleene 
star, which allows us to take the transitive closure of any path expression. Although in 
general this is not enough to already have MSO [30j, it does give an intuitive language 
with a counting ability. By regXPath(C', =) we refer to the enriched language where path 
expressions are extended by allowing the Kleene star on any path expression. 

a,(3 ::= \ [f] \ a(3 \ aU (3 \ a* o G O U {e} 

Let t be a data tree. The semantics of XPath is defined as the set of elements (in the 
case of node expressions) or pairs of elements (in the case of path expressions) selected by 
the expression. The data aware expressions are the cases {a = (3) and {a ^ (3). The formal 
definition of its semantics is in Figure [7| We write t |= 93 to denote [(/jJ* / 0, and in this 
case we say that t satisfies (p. 

Example 6.1. In the model of Figure [T] on page[7j 

^m /i[6])])f = {e, 1, 1-2}. 



'^|24) refers to XPath^(5, =) as 'forward XPath'. Here, 'forward XPath' is the unrestricted fragment 
XPath(5, =), as we believe is more appropriate. 
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lef = {(.T, x) I X G pos(t)} lif = {(x, x-i) I x-i e pos(t)} 

la U /3f = [af U [/^f Hf = {(x-z, x-(z + 1)) | x.(z + 1) e pos(t)} 

|(^ A ■01* = [t/?]* n (ipf = the reflexive transitive closure of |af 




(x, ^) G m\ d{y) = d{z)} (x, z) G [/3f , d(y) ^ d(^)} 



Figure 7: Semantics of regXPath(5^, =) for a data tree t = a® d. 



We define sub((^) to denote the set of all substrings of (f which are formulae, psub((^) := 
{a \ a e sub{ip),a is a path expression}, and nsub((^) := {ijj \ i/: e suh{ip),tp is a node 
expression}. 

6.1. Key constraints. It is worth noting that XPath(5^, =) — contrary to XPath^(5^, =) — 
can express unary key constraints. That is, whether for some symbol a, all the a-elements 
in the tree have different data values. 

Lemma 6.2. For every a £ A let key{a) be the property over a data tree t = a d: 
"For every two different positions x, x' G pos(t) of the tree, if a(x) = a(x') = a, then 
d(x) 7^ d(x')." Then, key{a) is expressible in XPath(S^, =) for any a. 

Proof. It is easy to see that the negation of this property can be tested by first guessing the 
closest common ancestor of two different a-elements with equal datum in the underlying 
first-child next-sibling binary tree. At this node, we verify the presence of two a-nodes 
with equal datum, one accessible with a relation and the other with a compound 
"— relation (hence the nodes are different). The expressibility of the property then 
follows from the logic being closed under negation. The reader can check that the following 
formula expresses the property 



Note that while ATRA cannot express, for instance, that there are two different nodes 
with the same data value, ATRA(guess) can express it. But on the other hand, ATRA(guess, spread) 
cannot express the negation of the property. 

Lemma 6.3. The class ATRA(guess, spread) cannot express the property "all the data values 
of the data tree are different". 

Proof. Ad absurdum, suppose that there exists an automaton J? expressing the property, 
and let Q be its set of states. Notice that for any accepting run of minimal length there 
are no more that f{\Q\) consecutive e-transitions (i.e., transitions that are fired by an 
underlying — transition on node configurations), for some fixed function /. 



key{a) ^ i* [ {e[a] =i+ [a]) V {i* [a\ =^+i* [a]) ] ) 




□ 
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Consider a data tree t with 

pos(t) = {e}U{l,2,3, ...,A^}U{11,111,...,111^__1} 

N times 

for N = 2 - f{\Q\) + 4. That is, the root of the tree has A'^ children, and the first child has a 
long branch with nodes. All positions of t have different data values, and they all carry 
the same label, say a. 

Consider a minimal accepting run of J? on t, 5i • • • Sn- Let Si be the first tree 
configuration containing a node configuration with position 2. That is, the first configuration 
after the second moving transition. Since there are at most 2- f{\Q\) non- moving transitions 
before Si in the run, we have that i < 2 • /(|Q|) + 2. In particular, this means that Si cannot 
contain more than 2 • + 2 different data values. Therefore, there is a position x 

with 2 < X < N and a position y with y ^ 1 such that neither d(x) or d(y) are in 
data[Si). (Remember that by definition of t, A[x) ^ This is because there are 

A^ — 1 = 2 • fi\Q\) + 3 different possible data values of x and for y. 

Consider t' as the result of replacing d(x) by d(y) in t. Clearly, t' does not have the 
property "all the data values of the data tree are different" . Now, consider the run obtained 
as the result of replacing d(a;) by d(y) in 5i, . . . ,Sn- Note that this is still a run, and it is 
still accepting. Therefore ^ accepts t and thus ^ does not express the property. D 



6.2. Satisfiability problem. This section is mainly dedicated to the decidability of the 
satisfiability problem for XPath(5^, =), known as 'forward-XPath'. This is proved by a re- 
duction to the emptiness problem of the automata model ATRA(guess, spread) introduced 
in Section [5i 

|24j shows that ATRA captures the fragment XPath^(5^, =). It is immediate to see that 
ATRA can also easily capture the Kleene star operator on any path formula, obtaining decid- 
ability of regXPath^(J, =). However, these decidability results cannot be further generalized 
to the full unrestricted forward fragment XPath(5, =) as ATRA is not powerful enough to 
capture the full expressivity of the logic. Indeed, while XPath(5^, =) can express that all the 



data values of the leaves are different, ATRA(guess, spread) cannot (Lemma 6.3). Although 
ATRA(guess, spread) cannot capture XPath(5, =), in the sequel we show that there exists a 
reduction from the satisfiability of regXPath(5', =) to the emptiness of ATRA(guess, spread), 
and hence that the former problem is decidable. This result settles an open question regard- 
ing the decidability of the satisfiability problem for the forward-XPath fragment XPath(5, =)• 
The main results that will be shown in Section |6.4| are the following. 

Theorem 6.4. Satisfiability o/ regXPath(5^, =) in the presence of DTDs (or any regular 
language) and unary key constraints is decidable, non-primitive-recursive. 

And hence the next corollary follows from the logic being closed under boolean opera- 
tions. 

Corollary 6.5. The query containment and the query equivalence problems are decidable 
for XPath{^,=). 

Moreover, these decidability results hold for regXPath(5^, =) and even for two extensions: 
• a navigational extension with upward axes (in Section [6. 5|), and 



a generalization of the data tests that can be performed (in Section 6.6). 
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XML data-tree 



Figure 8: From XML documents to data-trees. 

6.3. Data trees and XML documents. Although our main motivation for working with 
trees is related to static analysis of logics for xml documents, we work with data trees, 
being a simpler formalism to work with, from where results can be transferred to the class 
of XML documents. We discuss briefly how all the results we give on XPath over data trees, 
also hold for the class of xml documents. 

While a data tree has one data value for each node, an xml document may have several 
attributes at a node, each with a data value. However, every attribute of an XML element 
can be encoded as a child node in a data tree labeled by the attribute's name, as in Figure|8j 
This coding can be enforced by the formalisms we present below, and we can thus transfer 
all the decidability results to the class of xml documents. In fact, it suffices to demand 
that all the attribute symbols can only occur at the leaves of the data tree and to interpret 
attribute expressions like ^©attribl ' of XPath formulae as child path expressions '| [attribl]'. 



6.4. Decidability of forward XPath. This section is devoted to the proof of the following 
statement. 

Proposition 6.6. For every rj e regXPath(5^, =) there is a computable ATRA(guess, spread) 

automaton such that is nonempty iff r] is satisfiable. 

Markedly, the ATRA(guess, spread) class does not capture regXPath(5', =). However, 
given a formula rj, it is possible to construct an automaton that tests a property that 
guarantees the existence of a data tree verifying r]. 



Disjoint values property. To show the above proposition, we need to work with runs 
with the disjoint values property as stated next. 

Definition 6.7. A run Si ^ ■■■ ^ Sn on a. data tree t has the disjoint values property 

if for every x-i G pos(t) and p a moving node configuration of the run with position x-i, 
then 



data{t\x.i) n data(t\x.j) ^ data{p) 

Figure [9] illustrates this property. 



z-jepos(t) 

j>i 



The proof of Proposition 6.6 can be sketched as follows: 

(1) We show that for every nonempty automaton £ ATRA(guess, spread) there is an 
accepting run on a data tree with the disjoint values property. 

(2) We give an effective translation from an arbitrary forward XPath formula r] to an 
ATRA(guess, spread) automaton such that 
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Figure 9: The disjoint values property states that for every position x-i, the intersection of 
the grey zones is present in the last configuration for x-i appearing in the run. 

• any tree accepted by a run of the automaton with the disjoint values property 
verifies the XPath formula r], and 

• any tree verified by the formula r] is accepted by a run of the automaton with the 
disjoint values property. 

We start by proving the disjoint values property normal form. 

Proposition 6.8. For any nonempty automaton G ATRA(guess, spread) there exists an 
accepting run over some data tree with the disjoint values property. 

Proof. Given any accepting run Si ^ ■■■ ^ Sn on a data tree t = a (JD d, we show how 
to modify the run and the tree in order to satisfy the disjoint values property. We only 
need to replace some of the data values, so that the resulting tree and accepting run will 
be essentially the same. 

The idea is as follows. For a given position x-k of the tree, we consider the moving 
node configuration of the run at position x-k, and we replace all the data values from t\x.k 
by fresh ones except for those present in the node configuration, obtaining a new data tree 
t'. We also make the same replacement of data values for all node configurations in the run 
of nodes below x-k. Thus, we end up with a modified data tree t' and accepting run that 
satisfy the disjoint values property at x-k. That is, such that 



where px-k is the moving node configuration at position x-k of the run. If we repeat this 
procedure for all nodes of the tree, we obtain a run and tree with the disjoint values property. 
Next, we formalize this transformation. 

Take any x-k G pos(t), and let px-k £ Si for some i be such that px-k is a moving node 
configuration with position x-k. Consider any injective function 






a:-jepos(t) 

j>k 



f : data{t\x.k) \ data[px.k) — s- D \ data{t) 



and let / : data{t\x-k) — )• O be such that f{d) = difd £ data{px-k), or f{d) = f{d) otherwise. 
Note that / is injective. Let us consider then S[, ■ ■ ■ ,Sn where S'j consists in replacing every 
node configuration p G Sj with h{p), wherd^ 




6- 



'By f{p) we denote the replacement of every data value d by /(d) in p. 
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Take t' to be the data tree that results from the replacement in t of every data value of a 
position y >~ x-k by /(d(y)). 

Claim 1. 

data{t'\x.k) n data{t'\x.j) C data{px-k) ■ 

x-j6pos(t) 

j>k 

Proof. The only difference between t and t' is in the data values below x-k. Any node y of 
t' which is below x-k contains a data value f{d{y)) such that 

• is in dataip^.k), or 

• f{d{y)) is not in data{t), and therefore it is not in Ux-jepos(t) data{t'\x.j). 

j>k 

□ 

Claim 2. S[ ^ - - - ^ Sn is an accepting run of Si over t' . 

Proof. Take any leaf y which is rightmost (i.e., with no siblings to its right) and consider the 
sequence of node configurations pi G 5i , . . . , pn S 5n that are ancestors of y in the first-child 
next-sibling underlying tree structure. There is exactly one configuration in each Si due 
to Remark 5.2, This is the 'sub-run' that leads to y: for every either pi = p^+i or 

Pi pi+i. Let p'j = h{pj) for aU j E [n]. 

• If y ^ x-k, take i to be the last index such that pi = p^.k (there must be one). Note that 
p'j = fiPj) every d < j < n. The sequence p'^_^_^ G ^e+i^ ■ ■ - ■> p'n ^ •S'n '^^ isomorphic, 
modulo renaming of data values, to pt+i, ■ ■ ■ , Pn since / is injective. We then have that 
p[, . . . , p'^, . . . p'j^ is a correct run on node configurations, since 

— p[, . . . , p'^ is equal to pi, . . . , pi (it is not modified by h), 

— Pf+i) • • • 5 Pn is isomorphic to pi+i, . . ■ , Pn (we apply an injection / to every data value), 
and 

— the pair (p^,p^^^) is isomorphic to {pi,pi+i), as 

* P'e = fiPx-k) = Px-k = Pi, 

* P^+i = f{Pi+i) is isomorphic to pi+i, and 

* data{p'^) n data{p'f^_^^ = data{pi) n data{pi+i) since / is the identity on data{pe), and 
does not send any data value from D \ data{pi) to data{p^). 

• If y x-k, then nothing was modified: p'^ = pi G Si, . . . , p'^ = pn & Sn- 

In any case, we have that p'^, . . . is a correct run on node configurations. This means 
that the modified data values are innocuous for the run. As the structure of the run is 
not changed, this implies that 5{ ^ • • • ^ 5^ is an accepting run (that verifies the disjoint 
values property for x-k by the previous Claim). □ 

If we perform the same procedure for every position of the tree, we end up with an 
accepting run and tree with the disjoint values property. □ 

We define a translation from regXPath(5^, =) formulae to ATRA(guess, spread). Let rj be 
a regXPath(5^, =) formula and let Si be the corresponding ATRA(guess, spread) automaton 
defined by the translation. We show that (i) if a data tree t is accepted by ^ by a run 
verifying the disjoint values property, then t |= rj, and in turn (ii) if t |= r], then t is accepted 
by SI. Thus, by the disjoint values normal form (Proposition 6.8) we obtain our main result 
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of Proposition 6.6, which by decidabihty of ATRA(guess, spread) (Theorem 5.5) imphes that 



the satisfiabihty problem for regXPath(5^, =) is decidable. 

Normal form. For succinctness and simplicity of the translation, we assume that ij is in 
a normal form such that the 4,-axis is interpreted as the leftmost child. To obtain this 
normal form, it suffices to replace every appearance of 'J,' by 'J,— t-*'. Also, we assume that 
every data test subformula of the form {a = j3) or (a / j3) is such that a = e, a =\. 7, or 
a =^ 7 for some 7, and idem for (3. This will simplify the translation and the proofs for 
correctness. It is easy to see that every expression {a = (3) can be effectively transformed 
into a disjunction of formulae of the aforementioned form. 

The translation. Let ry be a regXPath(5^, =) node expression of the above form in negation 
normal form (nnf for short). Let t be a data tree and let zq d^fcns be two positions of t 
at distance n G INq such that, if zq 7^ z„, 

for some zi, . . . , Zn-i, we define str{zo, Zn) G A* to be the string aiSi ■ ■ ■ a„_i5„_i where 
ttj is either — )■ or J, depending on the relation between Zi-i and Zj, and 5, = {'0 G nsub(r/) | 
Zi G I'i/'l*}- If n = 0, then str{zQ^zo) = e. Note that str{zo,Zn) does not take into account 
zq, since we are working under the normal form where no path subformulae makes tests on 
the node where they start. For every path expression a € psub(7/), consider a deterministic 
complete finite automaton !Ha over the alphabet = 2"^"'^^^) U {4-, — ?•} which corresponds 
to that regular expression, in the sense that the following claim holds. 

Claim 3. For every a G psub(?7), and x :<fcns U, we have that IH^ accepts str{x,y) if, and 
only if (x,y) G |af . 

We assume the following names of its components: = (A^, (^q,, Qa, 0, F^), where 
Qa ^ INq is the finite set of states and G Qa is the initial state. We assume that Qa is 
partitioned into moving and testing states, such that in every accepting run the states of 
the run alternate between moving and testing states, starting in the moving state 0. 

We next show how to translate r] into an ATRA(guess, spread) automaton J^. For the 
sake of readability we define the transitions as positive boolean combinations of V and A over 
the set of basic tests and states. Any of these — take for instance 6{q) = (store((7i) A 752) V 
(gs A a) — can be rewritten into an equivalent ATRA with at most one boolean connector 



per transition (as in Definition 5.1 ) in polynomial time. The most important cases are those 
relative to the following data tests: 

(1) {a = f3) (2) {a ^(3) (3) ^{a = (3) (4) ^{a ^ /3) 

We define the ATRA(guess, spread) automaton 

Sl:= {A,Q,l\r]\),6) 

with 

Q ■■= m, ^«Pf , Hfest,, . I G nsub-(r?), 

a,/3 G psub"(77),® G {=, ^/}, i G Qa,j e Q/?} 

where op"' is the smallest superset of op closed under negation under nnf, i.e., ii G op~^{r]) 
then nnf (-!(/?) G op~'(ry) (where nnf is defined as shown in Table [T]). The idea is that a state 
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nnf(c/9 A t/j) 
nnf(-i((/3 A 

nnf (a /3) 
nnf (a*) 
nnf((a ® /?)) 
nnf (a) 
nnf(-i 



nni{ip) A nnf(V') 
nnf(-i(/3) V nnf(-i^) 
nnf(a) nnf(/3) 
(nnf(a))* 
(nnf(a) ® nnf(/3)) 
a 

nnf ((^) 



nnf ((/? V ip) 
nnf(^(v7 V^/')) 
nnf(M) 
nnf (o) 
nnf (-.(a ® 

nnf(-ia) 
nnf((a)) 



nnf((^) V nnf(V') 
nnf(-i(/9) A nnf(-i^) 
[nnf((^)] 
o o G 5 

-.(nnf(a) ® nnf(/3)) 
-la 

(nnf(a)) 



Table 1: Definition of the Negation Normal Form for XPath. 

verifies the formula (p. A state (la[)^ (resp. (1q;[)|^) verifies that there is a forward path 
in the tree ending at a node with the same (resp. different) data value as the one in the 

register, such that there exists a partial run of over such path that starts in a moving 
state i and ends in a final state. Similarly, a state (|a[)^st j or daD^g^ j verifies the same when 

i is a testing state. A state /3\)Yj (resp. I\a, f3\)fj) verifies that there are two paths ending 
in two nodes with the same (resp. one equal, the other different) data value as that of the 
register; such that one path has a partial accepting run of starting in a moving state i, 
and the other has a partial accepting run of fH^i^ starting in a moving state j. A state like 
daDp is simply to mark that the run of on a path has ended, and the only remaining 
task is to test for equality of the data value with respect to the register. Finally, a state 
(|a[)o^ (resp. (|a[)o^) verifies that every node reachable by a has different (resp. equal) 
data than the register, and similar for the other universal states of the form d • • • !)!?.'"• We 
first take care of the boolean connectors and the simplest tests. 

6{l\4):=a. 6{l\^Vi;li):=l\p\)\/m <5(d-a^) := a 6{i\^ A ^) := A ^ 

First, we define the transitions associated to each IH^, for i G Qa,C C Qa,® G {=)/}• 
Here, daDp holds at the endpoint of a path matching a. 



[a 



V a 



V 



test,SM,i) V >^"^"est,5.H.) ^ \/ ^a\)f 

■■= V i<\<.iS,i) A A M 

SCnsub(Q) v^eS 

Next, we focus only on the data- aware test formulae, since the tests (a) and ^{ct) are 
interpreted as the equivalent formula {a = a) and -i(a = a) respectively. Using the guess 
operator, we can easily define the cases corresponding to the data test cases (1) and (2) as 
follows. 

,5(da = /3^):=guess(da,/3r) 5{(\a, pr) ■= <\o^\)o ^ mo S{l\a\,^) := eq 

S{(\ay^P\)):=guess{(\a,l3\)^) Si(\a, := (\a\)^ A mt S{l\a\)f) := eq 

The test case (4) involves also an existential quantification over data values. In fact, 
-i(a 7^ /3) means that either 

(i) there are no nodes reachable by a, 

(ii) there are no nodes reachable by P, or 
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(iii) there exists a data value d such that both 

(a) all elements reachable by a have datum d, and 

(b) all elements reachable by /3 have datum d. 

6{l\a, Pr^) := (\a\i-^ A ^c^llf^) ■= eq S{(\a\)r) ■= eq 

SiHi^) := (V? V V^«^-® _,^(,^^)) A (>? V >^a^-® .)) A 

where ip stands for nnf (-K/?) and final = J^'^I^F ifiGFa, 

I irae otherwise. 

5Cnsub(a) ^seS 

The difficult part is the translation of the data test case (3). The main reason for this 
difficulty is the fact that ATRA(guess, spread) automata do not have the expressivity to make 
these kinds of tests. An expression -i(a = j3) forces the set of data values reachable by an 
ct-path and the set of those reachable by a /3-path to be disjoint. We show that nonetheless 
the automaton can test for a condition that is equivalent to -> (a = (3) if we assume that the 
run and tree have the disjoint values property. 

Example 6.9. As an example, suppose that t] = -i(4,a = — ^/3) is to be checked for 
satisfiability. One obvious answer would be to test separately a and f3. If both tests 
succeed, one can then build a model satisfying ij out of the two witnessing trees by making 
sure they have disjoint sets of values. Otherwise, rj is clearly unsatisfiable. Suppose now 
that we have i] = (p A^{la = -^P), where (p is any formula with no data tests of type (3). 
One could build the automaton for p and then ask for "spread( diaDo^ V d^^^Do^ )" in the 
automaton. This corresponds to the property "for every data value d taken into account 
by the automaton (as a result of the translation of p), either all elements reachable by a 
do not have datum d, or all elements reachable by (3 do not have datum d" . If p contains 
a {a' = P') formula, this translates to a guessing of a witnessing data value d. Then, the 
use of spread takes care of this particular data value, and indeed of all other data values 
that were guessed to satisfy similar demands. In other words, it is not because of d that 
-i(4,a = —?•/?) will be falsified. But then, the disjoint values property ensures that no pair 
of nodes accessible by a and /3 share the same datum. This is the main idea we encode 
next. 

We define S{(\^{a = j3)\j) := (\a,P\jQ^. Given ^{a = the automaton systematically 
looks for the closest common ancestor of every pair (x, y) of nodes accessible by a and /3 
respectively, and tests, for every data value d present in the node configuration, that either 
(1) all data values accessible by the remaining path of a are different from d, or (2) all data 
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values accessible by the remaining path of /3 are different from d. 
6{l\a,f3l)-=) := spread (^a^r V 

A (V? V y^a,f3\)-^sMMi,j)^ ^ ^ ^^"'/3^t";rt,5.H,^),5,H,,)) 

A final^ A final^ 

I true otherwise, 



where final^ 



and finalf^ 



store(^a^^-) ifjGF^j, 
true otherwise. 



a, 



SCnsub(Q) ipeS 

The first line of 5{(\a, f3\j^~) corresponds to the tests (1) and (2) above, and the third 
line corresponds to the cases where x :<fcns V and y :<fcns x. 

Next we show the correctness of this translation. We say that A has an accepting run 
from a thread {q, d) on t\x if there is a sequence of tree configurations 5i ^ • • • ^ 5„ such 
that Si is at position x and contains {q,d), and 5„ is accepting. We say that there is a 
dvp-run if 5i ^ • • • ^ 5„ has the disjoint values property. 

Lemma 6.10. For any data tree t, 

(=^) ift \= rj then accepts t with a dvp-run, and 

(<;=) if accepts t with a dvp-run, then t \= rj. 

Proof. Let t = a(g)d. We show that for every subformula ip G sub(7/) and position x G pos(t), 
if t\x \= then A has an accepting dvp-run from {(\(p\), d) on t\x for any d G D. (f) 
And conversely, 

if A has an accepting dvp-run from ((|(^D, d) on t\x for some d G D, then tl^^ ^ {\) 

We suppose that r/ is in nnf. We proceed by induction on the lexicographic order of 
(/(x), |(/?|), where f{x) is the length of a maximal path from x to a leaf in the fens coding 
(roughly, the height of t\x). The base cases when if = a or ip = are easy. Suppose then 
that the proposition is true for all position y >~fcns x, and for all V' S sub(?7), < \ip\ at 
position X. The cases (p = ipi A ip2 or ip = ipi \/ ip2 are straightforward from the inductive 
hypothesis. The following claim also follows from the inductive hypothesis. 

Claim 4. For any path subformula a of psub(ry), and y ^fcns x' , 

(a) if {x,y) G [a]* then there is an accepting dvp-run from {(\a\)Q , d{y)) on t\x, 

(b) if there is an accepting dvp-run from ((|a[)5^,d) on t\x, then {x,y) G [a]* for some y 
such that d(y) = d, 

(c) if {x, y) G [a]*, there is an accepting dvp-run from ((|aD^, d) with d ^ d(?/) on t|^, 

(d) if there is an accepting dvp-run from {<\a\)^,d) on t\x, then {x,y) G [a]* for some y 
with d 7^ d(y). 

Proof. By induction, suppose that Claim [2] holds for all a,x',y such that (/(x'),|a|) <iex 
(/(x), \ip\). As a consequence of the inductive hypothesis on ([f]), for all positions z with 
X ^fcns z -<fcns V-, and for every subformula G nsub(7?), if z G IV'I* then there is an 
accepting dvp-run of ^ from {^ij)\j,d) on t\z for any d G D. 
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Q Suppose first that {x,y) £ |a]*- Therefore, accepts str{x,y) by Claim [sj Further, 
suppose that 5 G is the (2i)-th element of str{x,y), and that z is the position of the 
i-th element in the path between x and y of the fens coding. For any ip £ S we have that 
z G IV'l*) and applying the inductive hypothesis on ([f]) we have that there is an accepting 
dvp-run from {lltp\),d) on t\z. Hence, by definition of the transition of (|a[)o^, we have that 
there is an accepting dvp-run from (daD^", d') on t|^ if we take d' = d{y). 
For any z with x ^jcns z :<fcns U, if there is an accepting dvp-run from {(\7p\j,d) on t|^, 
by inductive hypothesis on ([|]), we have that z G IV'l*- Therefore, if there is an accepting 
dvp-run from ((|qD^, d) on t|a;, by the definition of the transition relation there must be some 
y ^fcns X such that str{x,y) is accepted by Tia and further d(y) = d. Hence, {x,y) G 

An identical reasoning applies to show ([c]) and Q. □ 

(=^) Suppose that t\x |= (p. We focus on the data test cases. If = (a = /3), then 
(5((|(^D) = guess((|a, /3D^). Since S{i\a,(3\)^) = (\a\)Q A (\f3\)Q, we then have that for any data 
value d, there is an accepting run from ((l^?!),^) on t\x if there exists a data value d' G D 
such that (daO^'jd') and {(\l3\)Q,d') have accepting dvp-runs. Since t|^ |= {a = there 
must be two positions y and y' below x such that d{y) = d(y') where (x, y) G |a]* and 
{x,y') G By Claim 4|a there are accepting dvp-runs from and {<\f3\j'^ , d') 

on tj^^. These two dvp-runs can be simply combined to build an accepting dvp-run from 
, d). The case (/? = (a 7^ /3) is similar, using Claims 4|a and 4]|c 



If (/9 = -i(a = /?), then tj^; is such that for every pair of paths ending in some positions 
y,y', if the paths satisfy a and /3 respectively, then d(y) 7^ d(y'). Since 5{-^{a = (3)) = 
(|a,/3[)~'^, we show that there is an accepting run from {(\a, f3\)^^ ,d), for any d G D. Using 
the inductive hypothesis on all the nodes below x (as we did in Claim [4]), note that every 
time that the automaton has an accepting dvp-run from (([-(/^[l, d) at x for some d G D, then 
X G [V'l*- Hy definition of the transition relation on (|a,/3D~'^, this means that a sufficient 
condition for the automaton to have an accepting dvp-run from {(\a, , d) is: for every 
node z ^fcns x-, if the states of Tia,Tii3 are respectively i,j after reading str{x,z), then 
there is an accepting dvp-run from (spread ((|a[)^^ V (|/3[)J^),d) on t\z. To simplify the 
argument, we can assume that the configuration contains all the data values of the tree and 
therefore that opspread takes into account all possible data values (this is not a necessary 
condition, but certainly a sufficient one to have an accepting dvp-run). Also using the 
inductive hypothesis, there is an accepting dvp-run from (daOT'^j d') on if for every node 
z' ^fcns z such that Tia takes the state z to a final state after reading str{z, z'), we have 
d{z') 7^ d' . Therefore, there is an accepting dvp-run from spread (daD^^Vd/^Dj^) if for every 
data value d' and nodes y,y' that complete the paths, it cannot be that d' = d(y) = d(y'). 
This is true, since t\x |= ^{a = P). On the other hand, note that by the same reason, for 
every node z' such that {z^z') G there are accepting dvp-runs for {^a\j'^^ ,d{z')) and 
for (store(da[)7'^), d') from z' (and respectively swapping a and /3). Thus, by combining 
all the accepting dvp-runs from (spread (daDT'^ V d/3Dj^),d) for every such z, we obtain an 
accepting dvp-run from {^ip\j^d) on tl^^. 

Finally, \i ^ = -(a / /3), then 5(d^P)_= d-(a>^ V V guess(da, /J^^)- Since 

X G [v?]*, we have that either or (iii) (on page 36) holds for t\x. The first two 

conditions correspond to the properties for which -■(«) and are satisfied, which are 

equivalent to -i(q = a) and -i(/3 = /3). By the case already shown, this means that 
d-'(a = a)D or d~'(/3 = /3)D have accepting dvp-runs on tl^, respectively. Since the definition 
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of (|-'(a = a)D and (|-'(a;)D (idem with /3) are treated identically by the automaton, {(\ip\j,d) 
has an accepting dvp-rmi on t\x. 

We now show that the third condition corresponds to having an accepting dvp-run from 
(guess((|a, d) ont\x for any data value d. By definition of 6, there is an accepting dvp- 

run if there is a data value d' such that there are accepting dvp-runs from {l\a\)Q^ ,d') and 
((|/3Dq^, d'). By applying the inductive hypothesis on all positions below x and subformulas 
of nsub(r/), we have the following, for = {d(y) | {x,y) € |a]*}, Vx = {d(y) | {x,y) G 

im- 

Claim 5. For any e P", there is an accepting dvp-run from ((|a[)~'^,e) on t\x. IfT^x = 
{e}, there is an accepting dvp-run from ((|a[)~'^, e) on t\x. Idem for (5 and Vx- 

Note that since t\x |= ^{a ^ condition ^ corresponds to 2?" = 0, condition ^ to 



Vx = 0, and condition (iii) to = Vx = {e}. Therefore, if condition (iii) holds, there are 
accepting dvp-runs from ((|a[)~'^, e) and e), and hence also from (guess((|a, /3[)~'^), d) 

for any d, by guessing the data value e. Thus, there is an accepting dvp-run from d) 
if conditions (|i]), or (iii) hold. Therefore, there is an accepting dvp-run from on 

(<^) Consider an accepting dvp-run 5i ^ • • • -» 5„ from [l\ip\j,d) on t|a;. We show that 

tU 1= 

If = (a = /3), by definition of 5 the automaton guesses a data value d and creates 
a thread ((|a, /3D^, d), which means that there is an accepting dvp-run from /3D^, d). 
Then, there are accepting dvp-runs from {(\a\j^ ,d) and ((|/3D^, d). By Claim 4|a, there must 



be two nodes such that G la] and {x^z) G with d = d(z) = d(y). Hence, 

t\x 1= (p. The case (p = {a ^ P) \s treated in a similar way. 

If (/? = -i(a = suppose ad absurdum that there is a data value d and two positions 
y,z such that (x,y) G |a]* and (x,z) G [/3]*, and d(y) = d(z) = d. Let z' be the closest 
common ancestor of y, z in the fens coding. Since there is an accepting dvp-run from 
(|a,/3[)~'^, there must be some states i\j' assumed by Ha and Hp after reading str{x,z'), 
such that the following holds. 

Claim 6. There is a moving configuration Sr containing (|a,/3D^^/ with position z' such 
that 

• if z' = y, then ((|a[)^^, d(y)) is in some Sg, s > r, with position z' , 

• if z' = z, then {I\f3\jjr ,d{z)) is in some Sg, s > r, with position z' , 

• for every data value d of Sr, ((|a[)~'^, d)i' or {I\f3\)^^ ,d)j' are in some Sg, s > r, with 
position z' . 

Hence, by Claim |6] there must be a configuration Sr with position z', where z' is the common 
ancestor of y and z in the fens coding, such that for all data values d' of Sr it is not true 
that d(x) = d(y) = d! . But by the fact that this is a dvp-run, this means that there cannot 
be any data value d" such that d(y) = d(z) = d". In particular, d, and thus we have a 
contradiction. Then, t|^ \= ip. The cases of y :^fcns z or z :<fcns U are only easier. 

Finally, if (p = -i(a ^ then must have an accepting dvp-run on t\x from 
((|-i(a)D, d), ((|-i(/3)D, d), or ((|a, /3[)q^, d) for some d G O. In the first two cases this means 

that [(a)f I- = or |(/3)f I" = (by reduction to the case -i(a = a) already treated) and 
hence that t |= p. In the latter case, by definition of 6, all positions y whose paths from x 
satisfy a or (3 are such that d(y) = d. This implies that t ^ {a ^ (3), and hence t \= ip. □ 
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Lemma 6.10 together with Proposition 6.8 concludes the proof of Proposition 6.6 We 



then have that Theorem 16.41 holds. 



Proof of Theorem 6.4 By Proposition |6.6| satisfiability of regXPath(5^, =) is reducible to 
the nonemptiness problem for ATRA(guess, spread). On the other hand, we remark that 
ATRA(guess, spread) automata can encode any regular tree language — in particular a DTD, 
the core of XML Schema, or Relax NG — and are closed under intersection by Proposi- 
tion |5.3[ Also, the logic can express any unary key constraint as stated in Lemma |6.2[ 



Hence, by Theorem 5.5 the decidability follows. D 



Extensions 

We consider some operators that can be added to regXPath(5^, =) preserving the decidability 
of the satisfiability problem. For each of these operators, we will see that they can be 
coded as a ATRA(guess, spread) automaton, following the same lines of the translation in 
Section 16. 4i 



6.5. Allowing upward axes. Here we explore one possible decidable extension to the 
logic regXPath(5', =), whose decidability can be reduced to that of ATRA(guess, spread). In 
this extension we can make use of upward {'[) and leftward (■^) axes to navigate the tree 
in a restricted way. We can test, for example, that all the ancestors of a given node labeled 
with a have the same data value as all the descendants labeled with b, with the formula 
-i(t*[a] 7^ 4* [^]); but we cannot test its negation. 

Let regXPath'^(g^, =) be the fragment of regXPath(JU 55, =) where 55 := {f, t*, ^, *^} 
defined by the grammar 

(/?, ::= -ia|a|(/9AV'|'/'V^/'| (af) \ (at,) \ 

{af ® pf) I ^(of ® /3f) I -(ob = /3f) I -(ob / Pf) 

with ® G { = , a G A, and 

af, Pf ::= [ip] \ UfPf \ afU f3j \ oaf \ {uf)* oG{|, — 

at,, Pb ■'■= ['P] I "b/^b I abU/3b I oat, I ("b)* o € {1,^,8}. 

We must note that regXPath'^(5, =) contains regXPath(^5', 5S), that is, the full data-unaware 
fragment of XPath. We also remark that it is not closed under negation. Indeed, we cannot 
express the negation of "there exists an a such that all its ancestors labeled b have different 



data value" which is expressed by \.*[a A ~'(t*[^] = ^)]- As shown in Proposition 3.2, if the 
negation of this property were expressible, then its satisfiability would be undecidable. It 
is not hard to see that we can decide the satisfiability problem for this fragment. 
Consider the data test expressions of the types 

-■(ob = /3f) and ^{at, / f3f) 

where /?f G regXPath(5^, =) and a^ G regXPath(5S). We can decide the satisfaction of these 
kinds of expressions by means of spread ( , ), using carefully its first parameter to select the 
desired threads from which to collect the data values we are interested in. Intuitively, along 
the dvp-run we throw threads that save the current data value and try out all possible ways 
to verify a"^ G regXPath(^5^, =), where ( )'' stands for the reverse of the regular expression. 
Let the automaton arrive with a thread ((|abDi^) whenever OjJ is verified. This signals 
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that there is a backwards path from the current node in the relation Of, that arrives at a 
node with data value d. Hence, at any given position, the instruction spread (d «[,[), (|af[)~'®) 
translates correctly the expression ® /3f). Furthermore, «(, need not be necessarily in 
regXPath(5S), as its intermediate node tests can be formulas from regXPath(5^, =). We then 
obtain the following result. 

Remark 6.11. Satisfiability for regXPath*(5^, =) under key constraints and DTDs is decid- 
able. 

6.6. Allowing stronger data tests. Consider the property "tiere are three descendant 
nodes labeled a, b and c with the same data value". That is, there exists some data 
value d such that there are three nodes accessible by 4-* [a] , i* [b] and |* [c] respectively, all 
carrying the datum d. Let us denote the fact that they have the same or different datum by 
introducing the symbols '~' and '9^', and appending it at the end of the path. Then in this 
case we write that the elements must satisfy 4* 4* [t>]~, and 4* We then introduce 
the node expression ^aisi, . . . , UnSn^ where Oi is a path expression and Si £ {~, 9^} for all 
i G [l..n]. Semantically, it is a node expression that denotes all the tree positions x from 
which we can access n positions xi, . . . ,Xn such that there exists d G D where for all i £ [n] 
the following holds: {x, Xi) G |aj; if Sj = ~ then d(xj) = d; and if Sj = 9^ then d{xi) / d. 
Note that now we can express (a = j3) as •S[a~, /3~ J and (a / /3) as -^ct^) /^T^S'- Let us call 
regXPath"'"(5', =) to regXPath(5^, =) extended with the construction just explained. This is a 
more expressive formalism since the first mentioned property — or, to give another example, 
11* [a]~,4,* [b]~,4,* [a]9^,4,* [b]9^|— is not expressible in regXPath(5, =). 

We argue that satisfiability for this extension can be decided in the same way as for 
regXPath(5^, =). It is straightforward to see that positive appearances can easily be trans- 
lated with the help of the guess operator. On the other hand, for negative appearances, like 
-i-faisi, . . . ^anSn^i we proceed in the same way as we did for regXPath(5', =). The only 
difference being that in this case the automaton will simulate the simultaneous evaluation of 
the n expressions and calculate all possible configurations of the closest common ancestors 
of the endpoints, performing a spread at each of these intermediate points. 

Remark 6.12. Satisfiability of regXPath^(^J, =) under key constraints and DTDs is decidable. 

7. Concluding remarks 

We presented a simplified framework to work with one-way alternating register automata on 
data words and trees, enabling the possibility to easily show decidability of new operators 
by proving that they preserve the downward compatibility of a well-structured transition 
system. It would be interesting to hence investigate more decidable extensions, to study 
the expressiveness limits of decidable logics and automata for data trees. 

Also, this work argues in favor of exploring computational models that although they 
might be not closed under all boolean operations, can serve to show decidability of logics 
closed under negation — such as forward-XPath — or expressive natural extensions of existing 
logics —such as LTL;';^f(5^, 3^, V^). 

We finally mention that even though XPath(|, 4*, — )•, — =) {i.e., forward-XPath) has a 
non-primitive-recursive complexity, the results of [Hj suggest that it seems plausible that 
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XPath(|, I* -)-* =) or even XPath(|, |* — ;>* , =) are decidable in elementary time (see O 
Conjecture 1]). 

References 

[1] Parosh Aziz Abdulla, Karlis Cerans, Bengt Jonsson, and Yih-Kuen Tsay. General decidability theorems 
for infinite-state systems. In Annual IEEE Symposium on Logic in Computer Science (LICS'96), pages 
313-321, 1996. 

[2] Parosh Aziz Abdulla, Johann Deneux, Joel Ouaknine, and James Worrell. Decidability and complexity 
results for timed automata via channel machines. In International Colloquium on Automata, Languages 
and Programming (ICALP'05), pages 1089-1101, 2005. 

[3] Rajeev Alur and David L. Dill. A theory of timed automata. Theoretical Computer Science, 126:183-235, 
1994. 

[4] Michael Benedikt, Wenfei Fan, and Floris Geerts. XPath satisfiability in the presence of DTDs. Journal 

of the ACM, 55(2):l-79, 2008. 
[5] Mikolaj Bojanczyk, Anca MuschoU, Thomas Schwentick, and Luc Segoufin. Two-variable logic on data 

trees and XML reasoning. Journal of the ACM, 56(3):l-48, 2009. 
[6] Pierre Chambart and Philippe Schnoebelen. The ordinal recursive complexity of lossy channel systems. 

In Annual IEEE Symposium on Logic in Computer Science (LICS'08), pages 205-216. IEEE Computer 

Society Press, 2008. 

[7] James Clark and Steve DeRose. XML path language (XPath). Website, 1999. W3C Recommendation, 
http : // www . w3 . org/TR/ xpath 

[8] Stephane Demri and Ranko Lazic. LTL with the freeze quantifier and register automata. ACM Trans- 
actions on Computational Logic, 10(3), 2009. 
[9] Stephane Demri, Ranko Lazic, and David Nowak. On the freeze quantifier in constraint LTL: Decidabil- 
ity and complexity. In International Symposium on Temporal Representation and Reasoning (TIME'05), 
pages 113-121. IEEE Computer Society Press, 2005. 

[10] Leonard E. Dickson. Finiteness of the odd perfect and primitive abundant numbers with n distinct 
prime factors. The American Journal of Mathematics, 35(4):413-422, 1913. 

[11] Diego Figueira. Satisfiability of downward XPath with data equality tests. In ACM Symposium on 
Principles of Database Systems (PODS'09), pages 197-206. ACM Press, 2009. 

[12] Diego Figueira. Forward-XPath and extended register automata on data-trees. In International Con- 
ference on Database Theory (ICDT'lO). ACM Press, 2010. 

[13] Diego Figueira. Reasoning on Words and Trees with Data. Ph.D. thesis, Laboratoire Specification et 
Verification, ENS Cachan, France, December 2010. 

[14] Diego Figueira. A decidable two-way logic on data words. In Annual IEEE Symposium on Logic in 
Computer Science (LICS'll). IEEE Computer Society Press, 2011. 

[15] Diego Figueira, Santiago Figueira, Sylvain Schmitz, and Philippe Schnoebelen. Ackermannian and 
primitive-recursive bounds with Dickson's lemma. In Annual IEEE Symposium on Logic in Computer 
Science (LICS'll). IEEE Computer Society Press, 2011. 

[16] Diego Figueira, Piotr Hofman, and Slawomir Lasota. Relating timed and register automata. In Inter- 
national Workshop on Expressiveness in Concurrency (EXPRESS'lO), 2010. 

[17] Diego Figueira and Luc Segoufin. Future-looking logics on data words and trees. In International Sym- 
posium on Mathematical Foundations of Computer Science (MFCS'09), volume 5734 of LNCS, pages 
331-343. Springer, 2009. 

[18] Diego Figueira and Luc Segoufin. Bottom-up automata on data trees and vertical XPath. In Interna- 
tional Symposium on Theoretical Aspects of Computer Science (STACS'll). Springer, 2011. 

[19] Alain Finkel and Philippe Schnoebelen. Well-structured transition systems everywhere! Theoretical 
Computer Science, 256(l-2):63-92, 2001. 

[20] Floris Geerts and Wenfei Fan. Satisfiability of XPath queries with sibling axes. In International Sym- 
posium on Database Programming Languages (DBPL'05), volume 3774 of Lecture Notes in Computer 
Science, pages 122-137. Springer, 2005. 

[21] Georg Gottlob, Christoph Koch, and Reinhard Pichler. Efficient algorithms for processing XPath 
queries. ACM Transactions on Database Systems, 30(2):444-491, 2005. 



Alternating register automata on finite data words and trees 43 



[22] Graham Higman. Ordering by divisibility in abstract algebras. Proceedings of the London Mathematical 

Society (3), 2(7):326 -336, 1952. 
[23] Marcin Jurdzinski and Ranko Lazic. Alternation-free modal mu-calculus for data trees. In Annual IEEE 

Symposium on Logic in Computer Science (LICS'07), pages 131-140. IEEE Computer Society Press, 

2007. 

[24] Marcin Jurdzinski and Ranko Lazic. Alternating automata on data trees and XPath satisfiability. ACM 
Transactions on Computational Logic, 12(3):19, 2011. 

[25] Joseph B. Kruskal. Well-quasi-ordering, the tree theorem, and Vazsonyi's conjecture. Transactions of 
the American Mathematical Society, 95(2):210-225, 1960. 

[26] M.H. Lob and S.S. Waincr. Hierarchies of number theoretic functions, I. Archiv fur Mathematische Logik 
und Grundlagenforschung, 13:39-51, 1970. 

[27] Maarten Marx. XPath with conditional axis relations. In International Conference on Extending Data- 
base Technology (EDBT'04), volume 2992 of Lecture Notes in Computer Science, pages 477-494. 
Springer, 2004. 

[28] Philippe Schnoebclcn. Revisiting Ackermann-hardness for lossy counter machines and reset Petri nets. 

In International Symposium on Mathematical Foundations of Computer Science (MFCS'lO), volume 

6281 of Lecture Notes in Computer Science, pages 616-628. Springer, 2010. 
[29] Balder ten Gate. The expressivity of XPath with transitive closure. In ACM Symposium on Principles 

of Database Systems (PODS'06), pages 328-337. AGM Press, 2006. 
[30] Balder ten Gate and Luc Segoufin. XPath, transitive closure logic, and nested tree walking automata. 

In ACM Symposium on Principles of Database Systems (PODS'08), pages 251-260. AGM Press, 2008. 



This work is licensed under the Creative Commons Attribution-NoDerivs License. To view 
a copy of this iicense, visit http://creativecommons.0rg/iicenses/by-nd/2.o/ or send a 
letter to Creative Commons, 171 Second St, Suite 300, San Francisco, OA 94105, USA, or 
Eisenaclier Strasse 2, 1 0777 Berlin, Germany 



